# How to import an online Dataset

First, you need to get the url where the dataset can be downloaded, then you can download it with `curl`.

```shell
curl -L -o <filename> <url>
```

> Remember to use the `!` before the commands, this let the notebook we are executing a shell command.

## Download the datasets

In [13]:
!curl -L -o datasets/archive.zip https://www.kaggle.com/api/v1/datasets/download/tunguz/online-retail

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

 77 7296k   77 5664k    0     0  4881k      0  0:00:01  0:00:01 --:--:-- 4881k
100 7296k  100 7296k    0     0  5595k      0  0:00:01  0:00:01 --:--:-- 11.1M


## Read the dataset

After the dataset gets unzipped (if necessary), the next step is to get its correct path and read the data, then we are ready! Yo can read files of diffrent types with pandas:

```python
import pandas as pd

#csv files
data_csv = pd.read_csv(path)

# excel files
data_excel = pd.read_excel(path)

# json files
data_json = pd.read_json(path)
```

> WARNING!
>
> 
> When everything seems correct but you get a `UnicodeDecodeError` error, it means pandas is trying to read the CSV file but encounters characters that it cannot decode using the default encoding `utf-8`.
>
>To fix it, you can manually specify the encoding when loading the file. In many cases, CSV files that generate this error are encoded in `ISO-8859-1` or `latin1` or `windows-1252`.
>
> .

Example error: `UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 227179: invalid start byte`


In [33]:
import os
import pandas as pd

# Get the absolute path to the current notebook
os_path = os.getcwd()
# Add the extra path to the dataset file
dataset_path = os_path+'\datasets\Online_Retail.csv'
retail_data = pd.read_csv(dataset_path, encoding='ISO-8859-1')
print(retail_data)

       InvoiceNo StockCode                          Description  Quantity  \
0         536365    85123A   WHITE HANGING HEART T-LIGHT HOLDER         6   
1         536365     71053                  WHITE METAL LANTERN         6   
2         536365    84406B       CREAM CUPID HEARTS COAT HANGER         8   
3         536365    84029G  KNITTED UNION FLAG HOT WATER BOTTLE         6   
4         536365    84029E       RED WOOLLY HOTTIE WHITE HEART.         6   
...          ...       ...                                  ...       ...   
541904    581587     22613          PACK OF 20 SPACEBOY NAPKINS        12   
541905    581587     22899         CHILDREN'S APRON DOLLY GIRL          6   
541906    581587     23254        CHILDRENS CUTLERY DOLLY GIRL          4   
541907    581587     23255      CHILDRENS CUTLERY CIRCUS PARADE         4   
541908    581587     22138        BAKING SET 9 PIECE RETROSPOT          3   

          InvoiceDate  UnitPrice  CustomerID         Country  
0        12/

# How to import a dataset from Google Drive

```python
from google.colab import drive

drive.mount('/content/drive')
path = '/content/drive/MyDrive/online_retail.csv'

data = pd.read_csv(path)
```