## Importing the library

In [1]:
import pandas as pd

## Reading in data
To load data into python use pandas's ```read_..``` methods. There are plenty of read methods for different file formats.

For example, loading data from excel..
```python
excel_file = pd.ExcelFile('path_to_excel.xlsx')
df = pd.read_excel(excel_file, 'Sheet1')
```
for other files try typing ```pd.read_``` and then hit the \<tab\> key

**NOTE**: be careful when using the `\` (backslash) as path separator (default in Windows), as it is used to "escape" characters to give them new meaning, e.g. `\n` means new line.
If you need to use backslash, put a 'r' in front of the string to have python read it "raw".

In [2]:
pd.read_json(r'data\wine-reviews\winemag-data-130k-v2_tiny.json')

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
10,US,"Soft, supple plum envelopes an oaky structure ...",Mountain Cuvée,87,19.0,California,Napa Valley,Napa,Virginie Boone,@vboone,Kirkland Signature 2011 Mountain Cuvée Caberne...,Cabernet Sauvignon,Kirkland Signature
100,US,"Fresh apple, lemon and pear flavors are accent...",,88,18.0,New York,Finger Lakes,Finger Lakes,Anna Lee C. Iijima,,Ventosa 2015 Pinot Gris (Finger Lakes),Pinot Gris,Ventosa
101,US,"Dusty mineral, smoke and struck flint lend a s...",Red Oak Vineyard,87,20.0,New York,Finger Lakes,Finger Lakes,Anna Lee C. Iijima,,Lamoreaux Landing 2014 Red Oak Vineyard Riesli...,Riesling,Lamoreaux Landing
102,US,Intensely smoky tones of struck flint and ash ...,Yellow Dog Vineyard,87,20.0,New York,Finger Lakes,Finger Lakes,Anna Lee C. Iijima,,Lamoreaux Landing 2014 Yellow Dog Vineyard Rie...,Riesling,Lamoreaux Landing
103,Chile,A bright nose with green apple and citric arom...,Single Vineyard Falaris Hill,87,18.0,Leyda Valley,,,Michael Schachner,@wineschach,Leyda 2015 Single Vineyard Falaris Hill Chardo...,Chardonnay,Leyda
104,Italy,"Made with 65% Sangiovese, 20% Merlot and 15% C...",Nativo,87,16.0,Tuscany,Toscana,,Kerin O’Keefe,@kerinokeefe,Madonna Alta 2014 Nativo Red (Toscana),Red Blend,Madonna Alta
105,Italy,Made predominantly with Trebbiano and Malvasia...,Villa Antinori,87,14.0,Tuscany,Toscana,,Kerin O’Keefe,@kerinokeefe,Marchesi Antinori 2015 Villa Antinori White (T...,White Blend,Marchesi Antinori
106,Italy,"A blend of Cabernet Sauvignon, Merlot, Caberne...",Castiglioni,87,30.0,Tuscany,Toscana,,Kerin O’Keefe,@kerinokeefe,Marchesi de' Frescobaldi 2014 Castiglioni Red ...,Red Blend,Marchesi de' Frescobaldi


### Additional options
All the read methods have additional useful options.

The main ones for `read_csv` (some of which are in common with other `read_..` methods): 
- `sep`: defines the type of column separator
- `index_col`: defines which column should be read as the index (row label)
- `usecols`: allows reading only a subset of the columns
- `skiprows` and `skipfooter`: allows ignoring of initial/final rows
- `nrows`: allows reading of a limited amount of rows
- `na_values`: defines the values to be treated as null
- `parse_dates`: defines the columns that contain dates to be parsed
- `dayfirst`: defines that the day comes before the month in the date to be parsed
- `thousands`: defines the thousands numeric separator
- `decimal`: defines the decimal numeric separator
- `encoding`: defines the type of encoding to be used, default utf-8.

for more information consult the corresponding [documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html).

**TIP**: if you get a `UnicodeDecodeError: 'utf-8' codec` error, try adding `encoding='latin'` to the read options, that usually solves the issue.

In [3]:
pd.read_csv(
    'data/wine-reviews/winemag-data-130k-v2.csv',
    usecols=['wine_id', 'country', 'designation', 'price'],
    index_col='wine_id',
    nrows=5,
)

Unnamed: 0_level_0,country,designation,price
wine_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Italy,Vulkà Bianco,
1,Portugal,Avidagos,15.0
2,US,,14.0
3,US,Reserve Late Harvest,13.0
4,US,Vintner's Reserve Wild Child Block,65.0


### ***EXERCISE 2.1***
Using the pandas documentation, find a way to read the file `data/wine-reviews/exercise_2.1.txt` in the following way:
- read the file as tab-separated
- parse the date in the `date of rating` column, in day/month format
- set `country` as the index
- skip the 3rd and 4th rows

In [4]:
# insert solution here

# Creating Data
In order to creating a pd.DataFrame from data we have to follow the contructor specifications.

In general they can be created using:
- dictionary
- iterable (np.array, list, pd.Series..)

In [5]:
# using dictionary
data = {'Name': ['Bob', 'Mary'], 'Cats':[2,0], 'Dogs':[1,2]}

pd.DataFrame(data)

Unnamed: 0,Name,Cats,Dogs
0,Bob,2,1
1,Mary,0,2


In [6]:
# using 2D iterable
data = [['Bob', 2, 1], ['Mary', 0, 2]]
labels = ['Name','Cats','Dogs'] 

pd.DataFrame(data, columns=labels)

Unnamed: 0,Name,Cats,Dogs
0,Bob,2,1
1,Mary,0,2


### ***EXERCISE 2.2***
Create a DataFrame like the following:

|    	| r   	| g   	| b  	| hex     	|
|--------	|-----	|-----	|----	|---------	|
| blue   	| 0   	| 0   	| 1  	| #0000ff 	|
| olive  	| 85  	| 107 	| 47 	| #556B2F 	|
| sienna 	| 160 	| 82  	| 45 	| #A0522D 	|

In [7]:
# insert solution here