# Data Collection for Mildew Detection in Cherry Leaves

## Objectives
* Fetch the dataset of cherry leaf images from the provided source and examine its structure.
* Save the raw image data in an organized directory structure for easy access in later stages.

## Inputs
* Dataset URL or access key if the dataset is hosted on platforms like Kaggle.

## Outputs
* Directory structure containing the raw dataset divided into training, validation, and test sets.

## Additional Comments
* Ensure compliance with any data use agreements or NDAs associated with the dataset.
---

# Import packages


In [None]:
%pip install -r "/Users/jordanfletorides/Desktop/github repos/ml-mildew-detector/requirements.txt"

In [None]:
import numpy
import os

---

# Change working directory

* We are assuming you will store the notebooks in a subfolder, therefore when running the notebook in the editor, you will need to change the working directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [None]:
import os
current_dir = os.getcwd()
current_dir

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [None]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

Confirm the new current directory

In [None]:
current_dir = os.getcwd()
current_dir

# Changing file permissions (for Unix-like OS)
! chmod -R u+w inputs/cherry_leaves_dataset

---

# Install Kaggle

In [None]:
# install kaggle package
%pip install kaggle==1.5.12

Run the cell below **to change the Kaggle configuration directory to the current working directory and set permissions for the Kaggle authentication JSON**.

In [None]:
os.environ['KAGGLE_CONFIG_DIR'] = os.getcwd()
! chmod 600 kaggle.json

* Get the dataset path from the [Kaggle URL](https://www.kaggle.com/datasets/codeinstitute/cherry-leaves/data). When you are viewing the dataset at Kaggle, check what is after https://www.kaggle.com/ (in some cases kaggle.com/datasets). You should copy that at KaggleDatasetPath.
* Set your destination folder.

Set the Kaggle Dataset and Download it.

In [None]:
KaggleDatasetPath = "codeinstitute/cherry-leaves/data"
DestinationFolder = "inputs/cherry_leaves_dataset"
! kaggle datasets download -d {KaggleDatasetPath} -p {DestinationFolder}

Unzip the downloaded file, and delete the zip file.

In [None]:
import zipfile
with zipfile.ZipFile(DestinationFolder + '/cherry-leaves.zip', 'r') as zip_ref:
    zip_ref.extractall(DestinationFolder)

os.remove(DestinationFolder + '/cherry-leaves.zip')

---