# Introduction
Welcome to the **[Learn Pandas](https://www.kaggle.com/learn/pandas)** track. These hands-on exercises are targeted for someone who has worked with Pandas a little before. 
Each page it's own list of `relevant resources` you can use if you get stumped. The top item in each list has been custom-made to help you with the exercises on that page.

The first step in most data analytics projects is reading the data file. In this section, you'll create `Series` and `DataFrame` objects, both by hand and by reading data files.

# Relevant Resources
* ** [Creating, Reading and Writing Reference](https://www.kaggle.com/residentmario/creating-reading-and-writing-reference)**
* [General Pandas Cheat Sheet](https://assets.datacamp.com/blog_assets/PandasPythonForDataScience.pdf)

# Set Up
**First, Fork this notebook using the button towards the top of the screen.  Then you can run and edit code in the cells below.**

Run the code cell below to load libraries you will need (including coad to check your answers).

In [None]:
import pandas as pd
pd.set_option('max_rows', 5)

import sys
sys.path.append('..\\input\\advanced-pandas-exercises\\')
sys.path
from creating_reading_writing import *

# Checking Answers

You can check your answers in each of the exercises that follow using the  `check_qN` function provided in the code cell above (replacing `N` with the number of the exercise). For example here's how you would check an incorrect answer to exercise 1:

In [11]:
check_q1(pd.DataFrame())

False

For the questions that follow, if you use `check_qN` on your answer, and your answer is right, a simple `True` value will be returned.

If you get stuck, you may run the `print(answer_qN())` function to print the answer outright.

# Exercises

**Exercise 1**: Create a `DataFrame` that looks like this:

![](https://i.imgur.com/Ax3pp2A.png)

In [16]:
pd1 = pd.DataFrame(data = [[30, 21]], columns = ['Apples', 'Bananas'])
check_q1(pd1)

True

In [17]:
print(answer_q1())

pd.DataFrame({'Apples': [30], 'Bananas': [21]})
None


**Exercise 2**: Create the following `DataFrame`:

![](https://i.imgur.com/CHPn7ZF.png)

In [5]:
pd2 = pd.DataFrame(data = [[35, 21],[41, 34]], columns = ['Apples', 'Bananas'], index = ['2017 Sales', '2018 Sales'])
print(pd2)

            Apples  Bananas
2017 Sales      35       21
2018 Sales      41       34


In [6]:
check_q2(pd2)

True

In [7]:
print(answer_q2())

pd.DataFrame(
    {'Apples': [35, 41], 'Bananas': [21, 34]},
    index=['2017 Sales', '2018 Sales']
)
None


In [9]:
pd.DataFrame(
    {'Apples': [35, 41], 'Bananas':[21, 24]},
    index=['2017 Sales', '2018 Sales']
)

Unnamed: 0,Apples,Bananas
2017 Sales,35,21
2018 Sales,41,24


**Exercise 3**: Create a `Series` that looks like this:

```
Flour     4 cups
Milk       1 cup
Eggs     2 large
Spam       1 can
Name: Dinner, dtype: object
```

In [19]:
pd3 = pd.Series(
    ['4 cups', '1 cup', '2 large', '1 can'],
    index=['Flour','Milk', 'Eggs', 'Spam'],
    name = 'Dinner'
)

print(pd3)

Flour     4 cups
Milk       1 cup
Eggs     2 large
Spam       1 can
Name: Dinner, dtype: object


In [20]:
check_q3(pd3)
# print(answer_q3())

True

**Exercise 4**: Read the following `csv` dataset on wine reviews into the a `DataFrame`:

![](https://i.imgur.com/74RCZtU.png)

The filepath to the CSV file is `../../input/wine-reviews/winemag-data_first150k.csv`.

In [28]:
pd4 = pd.read_csv('../../input/wine-reviews/winemag-data_first150k.csv', index_col = 0)
pd4

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
0,US,This tremendous 100% varietal wine hails from ...,Martha's Vineyard,96,235.0,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz
1,Spain,"Ripe aromas of fig, blackberry and cassis are ...",Carodorum Selección Especial Reserva,96,110.0,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez
...,...,...,...,...,...,...,...,...,...,...
150928,France,"A perfect salmon shade, with scents of peaches...",Grand Brut Rosé,90,52.0,Champagne,Champagne,,Champagne Blend,Gosset
150929,Italy,More Pinot Grigios should taste like this. A r...,,90,15.0,Northeastern Italy,Alto Adige,,Pinot Grigio,Alois Lageder


In [27]:
check_q4(pd4)

True

**Exercise 5**: Read the following `xls` sheet into a `DataFrame`: 

![](https://i.imgur.com/QZJBIBF.png)

The filepath to the XLS file is `../../input/publicassistance/xls_files_all/WICAgencies2014ytd.xls`.

Hint: the name of the method you need inclues the word `excel`. The name of the sheet is `Pregnant Women Participating`. Don't do any cleanup.

In [39]:
pd5 = pd.read_excel('../../input/publicassistance/xls_files_all/WICAgencies2014ytd.xls', 
                    sheet_name='Pregnant Women Participating')
pd5

Unnamed: 0,WIC PROGRAM -- NUMBER OF PREGNANT WOMEN PARTICIPATING,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13
0,FISCAL YEAR 2014,,,,,,,,,,,,,
1,"Data as of January 05, 2018",,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
104,,,,,,,,,,,,,,
105,All data are preliminary and are subject to re...,,,,,,,,,,,,,


In [40]:
check_q5(pd5)

  return func(*args, **kwargs)


True

**Exercise 6**: Suppose we have the following `DataFrame`:

In [42]:
q6_df = pd.DataFrame({'Cows': [12, 20], 'Goats': [22, 19]}, index=['Year 1', 'Year 2'])

Save this `DataFrame` to disc as a `csv` file with the name `cows_and_goats.csv`.

In [43]:
q6_df.to_csv("cows_and_goats.csv")

In [45]:
check_q6()

True

**Exercise 7**: This exercise is optional. Read the following `SQL` data into a `DataFrame`:

![](https://i.imgur.com/mmvbOT3.png)

The filepath is `../../input/pitchfork-data/database.sqlite`. Hint: use the `sqlite3` library. The name of the table is `artists`.

In [5]:
import sqlite3

conn = sqlite3.connect('../../input/pitchfork-data/database.sqlite')
c = conn.cursor()

# result = c.execute("SELECT * FROM artists")

# for row in result:
#         print(row)

pd7 = pd.read_sql_query("SELECT * FROM artists", conn)

conn.close()

pd7

Unnamed: 0,reviewid,artist
0,22703,massive attack
1,22721,krallice
...,...,...
18829,2413,don caballero
18830,3723,neil hamburger


In [6]:
check_q7(pd7)

True