# Using the Edge Impulse Python SDK to Upload and Download Data

<!--- Do not modify the markdown for this example directly! It is generated from a notebook in https://github.com/edgeimpulse/notebooks --->

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://docs.edgeimpulse.com/docs/tutorials/ml-and-data-engineering/ei-python-sdk/python-sdk-upload-download"><img src="https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/logo-ei-32px.png" /> View on edgeimpulse.com</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/edgeimpulse/notebooks/blob/main/notebooks/python-sdk-upload-download.ipynb"><img src="https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/logo-colab-32px.png" /> Run in Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/edgeimpulse/notebooks/blob/main/notebooks/python-sdk-upload-download.ipynb"><img src="https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/logo-github-32px.png" /> View source on GitHub</a>
  </td>
  <td>
    <a href="https://raw.githubusercontent.com/edgeimpulse/notebooks/main/notebooks/python-sdk-upload-download.ipynb" download><img src="https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/icon-download-32px.png" /> Download notebook</a>
  </td>
</table>

If you want to upload files directly to an Edge Impulse project, we recommend using the [CLI uploader tool](https://docs.edgeimpulse.com/docs/tools/edge-impulse-cli/cli-uploader). However, sometimes you cannot upload your samples directly, as you might need to convert the files to one of the accepted formats or modify the data prior to model training. Edge Impulse offers [data augmentation](https://docs.edgeimpulse.com/docs/tips-and-tricks/data-augmentation) for some types of projects, but you might want to create your own custom augmentation scheme. Or perhaps you want to [generate synthetic data](https://docs.edgeimpulse.com/docs/tutorials/ml-and-data-engineering/generate-synthetic-datasets) and script the upload process.

The Python SDK offers a set of functions to help you move data into and out of your project. This can be extremely helpful when generating or augmenting your dataset. The following cells demonstrate some of these upload and download functions.

You can find the API documentation for the functions found in this tutorial [here](https://edgeimpulse.github.io/python-sdk/source/edgeimpulse.data.html). 

> **WARNING:** This notebook will add and delete data in your Edge Impulse project, so be careful! We recommend creating a throwaway project when testing this notebook.

In [None]:
# If you have not done so already, install the following dependencies
# !python -m pip install edgeimpulse

In [None]:
from datetime import datetime
import io
import json

import edgeimpulse as ei

You will need to obtain an API key from an Edge Impulse project. Log into [edgeimpulse.com](https://edgeimpulse.com/) and create a new project. Open the project, navigate to **Dashboard** and click on the **Keys** tab to view your API keys. Double-click on the API key to highlight it, right-click, and select **Copy**.

![Copy API key from Edge Impulse project](https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/python-sdk-copy-ei-api-key.png)

Note that you do not actually need to use the project in the Edge Impulse Studio. We just need the API Key.

Paste that API key string in the `ei.API_KEY` value in the following cell:

In [None]:
# Settings <<<<<TODO: CHANGE THESE>>>>>
ei.API_ENDPOINT = "http://localhost:4800/v1"
ei.INGESTION_ENDPOINT = "http://localhost:4810"
ei.API_KEY = "ei_3881f52c5823ebe2d885aec9b0dac995836b36fb76b50e2aaf70b4293e90fbd9" # Change this to your Edge Impulse API key

## Upload folder

You can upload all files in a directory using the Python SDK. Note that you can set the *category*, *label*, and *metadata* for all files with a single call. If you want to use a different label for each file set `label=None` in the function call and name your files with *\<label\>.\<name\>.\<ext\>*. For example, *wave.01.csv* will have the label *wave* when uploaded. See [here](https://docs.edgeimpulse.com/docs/tools/edge-impulse-cli/cli-uploader#custom-labeling-and-metadata) for more information.

In [None]:
# Download image files to use as an example dataset
!mkdir -p dataset
!wget -P dataset -q \
  https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/capacitor.01.png \
  https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/capacitor.02.png

In [None]:
# Upload the entire directory
successes, fails = ei.data.upload_directory(
    directory="dataset",
    category="training",
    label=None, # Will use the prefix before the '.' on each filename for the label
    metadata={
        "date": datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
        "source": "camera",
    }
)

# Check to make sure there were no failures
assert len(fails) == 0, "Could not upload some files"

# Save the sample IDs, as we will need these to retrieve file information and delete samples
ids = []
for sample in successes:
    ids.append(sample[0].sample_id)

If you head to the *Data acquisition* page on your project, you should see images in your dataset.

![Images uploaded to Edge Impulse project](https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/python-sdk-upload-download-images.png)

In [None]:
# Review the sample IDs and get the associated server-side filename
# Note the lack of extension! Multiple samples on the server can have the same filename.
for id in ids:
    filename = ei.data.get_filename_by_id(id)
    print(f"Sample ID: {id}, filename: {filename}")

## Download files

You can download samples from your Edge Impulse project if you know the sample IDs. You can get sample IDs by calling the `ei.data.get_sample_ids()` function, which allows you to filter IDs based on filename, category, and label. 

In [None]:
# Get sample IDs for everything in the "training" category
# TODO: FIX THIS
filenames = ["dog-ball-toy.01", "dog-ball-toy.02", "capacitor.01", "capacitor.02"]
ids = []
for filename in filenames:
    ids.extend(ei.data.get_ids_by_filename(filename))

In [None]:
# Download samples
samples = ei.data.download_samples_by_ids(ids)

# Save the downloaded files
for sample in samples:
    with open(sample.filename, "wb") as file:
        file.write(sample.data.read())

# View sample information
for sample in samples:
    print(
        f"filename: {sample.filename}\r\n"
        f"  sample ID: {sample.sample_id}\r\n"
        f"  category: {sample.category}\r\n"
        f"  label: {sample.label}\r\n"
        f"  bounding boxes: {sample.bounding_boxes}\r\n"
        f"  metadata: {sample.metadata}"
    )

Take a look at the files in this directory. You should see the downloaded images. They should match the images in the *dataset/* directory, which were the original images that we uploaded.

## Delete files

If you know the ID of the sample you would like to delete, you can call the `ei.data.delete_sample_by_id()` function. You can also delete all the samples in your project by calling `ei.data.delete_all_samples()`.

In [None]:
# Delete the samples from the Edge Impulse project that we uploaded earlier
for id in ids:
    ei.data.delete_sample_by_id(id)

Take a look at the data in your project. The samples that we uploaded should be gone.

## Upload folder for object detection

For object detection, you can put bounding box information (following the [Edge Impulse bounding box format](https://docs.edgeimpulse.com/reference/image-dataset-annotation-formats)) in a file named *labels.info* in that same directory.

In [None]:
# Download image files to use as an example dataset
!mkdir -p dataset
!rm dataset/capacitor.01.png dataset/capacitor.02.png
!wget -P dataset -q \
  https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/dog-ball-toy.01.png \
  https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/dog-ball-toy.02.png \
  https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/annotations/info.labels

## Upload CSV data

The Edge Impulse ingestion service accepts CSV files, which we can use to upload raw data. Note that if you configure a CSV template using the [CSV Wizard](https://docs.edgeimpulse.com/docs/edge-impulse-studio/data-acquisition/csv-wizard), then the expected format of the CSV file might change.

In [None]:
# Create example CSV data
sample_data = [
    [
        ["timestamp", "accX", "accY", "accZ"],
        [-9.81, 0.03, 0.21],
        [-9.83, 0.04, 0.27],
        [-9.12, 0.03, 0.23],
        [-9.14, 0.01, 0.25],
    ],
    [
        ["timestamp", "accX", "accY", "accZ"],
        [-9.56, 5.34, 1.21],
        [-9.43, 1.37, 1.27],
        [-9.22, -4.03, 1.23],
        [-9.50, -0.98, 1.25],
    ],
]

## Upload JSON data

Another way to upload data is to encode it in JSON format. See the [data acquisition format specificaion](https://docs.edgeimpulse.com/reference/data-acquisition-format#data-acquisition-format-specification) for more information on acceptable key/value pairs. Note that at this time, the `signature` value can be set to `0`.

The raw data must be encoded in an IO object. We convert the dictionary objects to a `BytesIO` object, but you can also read in data from *.json* files.

In [None]:
# Create two different example data samples
sample_data_1 = {
    "protected": {
        "ver": "v1",
        "alg": "none",
    },
    "signature": 0,
    "payload": {
        "device_name": "ac:87:a3:0a:2d:1b",
        "device_type": "DISCO-L475VG-IOT01A",
        "interval_ms": 10,
        "sensors": [
            { "name": "accX", "units": "m/s2" },
            { "name": "accY", "units": "m/s2" },
            { "name": "accZ", "units": "m/s2" }
        ],
        "values": [
            [ -9.81, 0.03, 0.21 ],
            [ -9.83, 0.04, 0.27 ],
            [ -9.12, 0.03, 0.23 ],
            [ -9.14, 0.01, 0.25 ]
        ]
    }
}
sample_data_2 = {
    "protected": {
        "ver": "v1",
        "alg": "none",
    },
    "signature": 0,
    "payload": {
        "device_name": "ac:87:a3:0a:2d:1b",
        "device_type": "DISCO-L475VG-IOT01A",
        "interval_ms": 10,
        "sensors": [
            { "name": "accX", "units": "m/s2" },
            { "name": "accY", "units": "m/s2" },
            { "name": "accZ", "units": "m/s2" }
        ],
        "values": [
            [ -9.56, 5.34, 1.21 ],
            [ -9.43, 1.37, 1.27 ],
            [ -9.22, -4.03, 1.23 ],
            [ -9.50, -0.98, 1.25 ]
        ]
    }
}

In [None]:
# Provide a filename, category, label, and optional metadata for each sample
my_samples = [
    {
        "filename": "001.json",
        "data": io.BytesIO(json.dumps(sample_data_1).encode('utf-8')),
        "category": "training",
        "label": "idle",
        "metadata": {
            "source": "accelerometer",
            "collection site": "desk",
        },
    },
    {
        "filename": "002.json",
        "data": io.BytesIO(json.dumps(sample_data_2).encode('utf-8')),
        "category": "training",
        "label": "wave",
        "metadata": {
            "source": "accelerometer",
            "collection site": "desk",
        },
    },
]

In [None]:
# Wrap the samples in instances of the Sample class
samples = [ei.data.sample_type.Sample(**i) for i in my_samples]

# Upload samples to your project
successes, fails = ei.data.upload_samples(samples)

# Check to make sure there were no failures
assert len(fails) == 0, "Could not upload some files"

# Save the sample IDs, as we will need these to retrieve file information and delete samples
ids = []
for sample in successes:
    ids.append(sample[0].sample_id)

If you head to the *Data acquisition* page on your project, you should see your time series data.

![Copy API key from Edge Impulse project](https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/python-sdk-upload-download-json-data.png)

In [None]:
# Delete the samples from the Edge Impulse project
for id in ids:
    ei.data.delete_sample_by_id(id)