# First steps with Remo python library

## Create and visualize a dataset


**Let's create a new dataset and upload some annotations.**

In [2]:
import remo
import pandas as pd
remo.set_viewer('jupyter')

In [3]:
urls = ['https://remo-scripts.s3-eu-west-1.amazonaws.com/open_images_sample_dataset.zip']

my_dataset = remo.create_dataset(name = 'open images detection',
                    urls = urls,
                    annotation_task = "Object detection")

Acquiring data - completed                                                                           
Processing data - completed                                                                          
Data upload completed


You can read more about what type of annotation tasks and formats we support in [the documentation](https://remo.ai/docs/annotation-formats/).


<div class="alert alert-info"><b> That's it!</b>
    
Our data is now easily accessible and stored in a centralised place. 
    
We can easily organise it, search through it or share it with other team members as needed.</div>

**Let's list all the datasets and retrieve one:**

In [5]:
remo.list_datasets()

[Dataset 1 - 'ocr_symbols',
 Dataset 2 - 'test',
 Dataset 8 - 'open_images',
 Dataset 9 - 'test',
 Dataset 12 - 'open images detection']

In [7]:
# make sure to use the right ID when running the tutorial
new_dataset = remo.get_dataset(1)

**We can visualise our dataset directly in Jupyter** 

(or in a separate window if you are not a fan of notebooks)


In [None]:
my_dataset.view()

![view_dataset.gif](assets/view_dataset.gif)

## Visualize Annotation Statistics

**Once data is loaded in remo, we can better understand our dataset**

For example, we can easily see:
- what's contained in the annotations
- check if there are unbalanced classes
- spot if some objects are only contained in a few images

We can do this by printing the stats of an annotation set or using the interactive UI

In [9]:
my_dataset.get_annotation_statistics()

[{'AnnotationSet ID': 41,
  'AnnotationSet name': 'Object detection',
  'n_images': 10,
  'n_classes': 18,
  'n_objects': 98,
  'top_3_classes': [{'name': 'Fruit', 'count': 27},
   {'name': 'Sports equipment', 'count': 12},
   {'name': 'Human arm', 'count': 10}],
  'creation_date': None,
  'last_modified_date': '2020-05-29T13:38:52.259776Z'}]

In [None]:
my_dataset.view_annotation_stats()

![annotation_statistics.png](assets/annotation_statistics.png) 

## Export Annotations

**We can export annotations in a standardised format, and feed them directly to a model or use them for further analysis**

In [12]:
my_dataset.export_annotations_to_file('output.csv', annotation_format='csv')

## Further functionalities

<div class="alert alert-info"><b>You can refer to other tutorials and the documentation to further explore the library and see how to use it to better manage your datasets.</b>
    
Some of the other things you can do include:

- Easily experimenting with choice of annotations
- Custom uploading of annotations and predictions and joint visualization
- Advanced images search by classes, tags and filenames</div>