<a href="https://colab.research.google.com/github/solarslurpi/GrowBuddies/blob/main/GrowBuddies/growbuddiesproject/growbuddies/drgrowbuddy/DrGrowBuddy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# The Journey Begins
Today is 1/16/2022. MLK day and all that.  Starting my DrGrowBuddy Journey.  The first thing I want to do is run a training/test/validation run using two categories: healthy and unhealthy.  This will get me started down the data pipeline as well as just getting started.

I found [this PyTorch GitHub](https://github.com/abdullahselek/plant-disease-classification-pytorch) with Python code for Plant Disease Detection.  HappyDay!

## Import Libraries

In [None]:
import torch

In [None]:
# CPU or GPU
DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"The device running the code is the {DEVICE}.")

The device running the code is the cpu


## Get Data
This is interesting because my goal is to diagnose cannabis plant nutrient deficiencies, then move on to diseases and then pests.  I don't have a lot of images.  So my options for a lot of images include scraping the internet, using existing plant databases, using my own images.  Let's see...

### Existing Plant Data sets
Promising data sets include:
- [PlantDoc dataset](https://github.com/pratikkayal/PlantDoc-Dataset)
- [Plant Leaf dataset](https://data.mendeley.com/datasets/tywbtsjrjv/1)
- [Plant Village dataset](https://knowyourdata-tfds.withgoogle.com/#tab=STATS&dataset=plant_village)  

#### PlantDoc Dataset
##### GitHub
https://github.com/pratikkayal/PlantDoc-Dataset    

##### License
Creative Commons Attribution 4.0 International [Link](https://github.com/pratikkayal/PlantDoc-Dataset/blob/master/LICENSE.txt).

##### Bibtex
@inproceedings{10.1145/3371158.3371196,
author = {Singh, Davinder and Jain, Naman and Jain, Pranjali and Kayal, Pratik and Kumawat, Sudhakar and Batra, Nipun},
title = {PlantDoc: A Dataset for Visual Plant Disease Detection},
year = {2020},
isbn = {9781450377386},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3371158.3371196},
doi = {10.1145/3371158.3371196},
booktitle = {Proceedings of the 7th ACM IKDD CoDS and 25th COMAD},
pages = {249–253},
numpages = {5},
keywords = {Deep Learning, Object Detection, Image Classification},
location = {Hyderabad, India},
series = {CoDS COMAD 2020}
}
##### Paper
[PlantDoc: A Dataset for Visual Plant Disease Detecton](https://arxiv.org/abs/1911.10317)

__From the abstract:__
India loses 35% of the annual crop yield due to plant diseases. Early detection of plant diseases remains difficult due to the lack of lab infrastructure and expertise. In this paper, we explore the possibility of computer vision approaches for scalable and early plant disease detection. The lack of availability of sufficiently large-scale non-lab data set remains a major challenge for enabling vision based plant disease detection. Against this background, we present __PlantDoc: a dataset for visual plant disease detection. Our dataset contains 2,598 data points in total across 13 plant species and up to 17 classes of diseases, involving approximately 300 human hours of effort in annotating internet scraped images.__ To show the efficacy of our dataset, we learn 3 models for the task of plant disease classification. Our results show that modelling using our dataset can increase the classification accuracy by up to 31%. We believe that our dataset can help reduce the entry barrier of computer vision techniques in plant disease detection.

#### Explore PlantDoc Dataset


In [34]:
from pathlib import Path
# def walk_through_dir(dir_path):
#   """
#   Walks through dir_path returning its contents.
#   Args:
#     dir_path (str or pathlib.Path): target directory
  
#   Returns:
#     A print out of:
#       number of subdiretories in dir_path
#       number of images (files) in each subdirectory
#       name of each subdirectory
#   """
#   for dirpath, dirnames, filenames in os.walk(dir_path):
#     print(f"There are {len(dirnames)} directories and {len(filenames)} images in '{dirpath}'.")

def walk_through_dir(dir_path):
    data = []
    for dirpath in Path(dir_path).rglob('*'):
        if dirpath.is_dir():
            data.append([len(list(dirpath.iterdir())), 0, dirpath.name])
    return data


In [None]:
# Download data files from GitHub
from typing import Optional
from pathlib import Path
import zipfile
import urllib.request
from concurrent.futures import ThreadPoolExecutor

class Content:
    def __init__(self, base_dir: Optional[Path] = None) -> None:
        self.base_dir = base_dir or Path.cwd()
    
    def create_dir(self, path: Optional[Path] = None) -> None:
        path = path or self.base_dir
        path.mkdir(parents=True, exist_ok=True)
    
    def get_file_path(self, url: str) -> Path:
        parts = Path(url).parts
        filename = parts[-1]
        return self.base_dir / filename

class Downloader:
    def __init__(self, url: str, content: Content) -> None:
        self.url = url
        self.file_path = content.get_file_path(self.url)
    
    def download(self) -> None:
        with ThreadPoolExecutor() as executor:
            executor.submit(urllib.request.urlretrieve, self.url, self.file_path)

    def extract(self) -> None:
        with zipfile.ZipFile(self.file_path, 'r') as zip_ref:
            zip_ref.extractall(self.file_path.parent)
        self.file_path.unlink()





In [30]:
# download.
url = "https://github.com/pratikkayal/PlantDoc-Dataset/archive/master.zip"
content_dir = Path(".")
content = Content(content_dir)
content.create_dir()
downloader = Downloader(url, content)
downloader.download()
downloader.extract()

In [35]:
# Check out the directory structure.
data = walk_through_dir("/content/PlantDoc-Dataset-master")
data

[[28, 0, 'train'],
 [27, 0, 'test'],
 [106, 0, 'Blueberry leaf'],
 [57, 0, 'grape leaf'],
 [109, 0, 'Potato leaf early blight'],
 [106, 0, 'Corn rust leaf'],
 [57, 0, 'Soyabean leaf'],
 [85, 0, 'Tomato mold leaf'],
 [64, 0, 'Corn Gray leaf spot'],
 [103, 0, 'Peach leaf'],
 [82, 0, 'Apple leaf'],
 [97, 0, 'Potato leaf late blight'],
 [47, 0, 'Cherry leaf'],
 [112, 0, 'Raspberry leaf'],
 [2, 0, 'Tomato two spotted spider mites leaf'],
 [62, 0, 'Bell_pepper leaf spot'],
 [101, 0, 'Tomato leaf late blight'],
 [79, 0, 'Tomato Early blight leaf'],
 [124, 0, 'Squash Powdery mildew leaf'],
 [83, 0, 'Apple Scab Leaf'],
 [70, 0, 'Tomato leaf yellow virus'],
 [56, 0, 'grape leaf black rot'],
 [88, 0, 'Strawberry leaf'],
 [79, 0, 'Apple rust leaf'],
 [101, 0, 'Tomato leaf bacterial spot'],
 [44, 0, 'Tomato leaf mosaic virus'],
 [55, 0, 'Tomato leaf'],
 [140, 0, 'Tomato Septoria leaf spot'],
 [53, 0, 'Bell_pepper leaf'],
 [180, 0, 'Corn leaf blight'],
 [11, 0, 'Blueberry leaf'],
 [12, 0, 'grape lea