# Pandas

* [datacamp](http://datacamp-community-prod.s3.amazonaws.com/dbed353d-2757-4617-8206-8767ab379ab3)
* [python](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)

In [1]:
import pandas as pd

## Pandas

`pd.DataFrame()`
 constructor to generate these DataFrame objects

In [2]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 
              'Sue': ['Pretty good.', 'Bland.']},
             index=['Product A', 'Product B'])

Unnamed: 0,Bob,Sue
Product A,I liked it.,Pretty good.
Product B,It was awful.,Bland.


## Series


A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list.

A Series is, in essence, a single column of a DataFrame. So you can assign column values to the Series the same way as before, using an index parameter. However, a Series does not have a column name, it only has one overall name:

In [4]:
pd.Series([1, 2, 3, 4, 5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [5]:
pd.Series([30, 35, 40], index=['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')

2015 Sales    30
2016 Sales    35
2017 Sales    40
Name: Product A, dtype: int64

## Accessors

We can access the property of an object by accessing it as an attribute. 

`reviews.country`

If we have a Python dictionary, we can access its values using the indexing ([]) operator. We can do the same with columns in a DataFrame:
`reviews['country']`

To drill down to a single specific value, we need only use the indexing operator [] once more
`reviews['country'][0]`


## iloc

**index-based selection**: selecting data based on its numerical position in the data. When we use **iloc** we treat the dataset like a big matrix (a list of lists), one that we have to index into by position

* To select just the second and third entries of the first columns:

`reviews.iloc[1:3, 0]`

* To select the last five elements of the dataset

`reviews.iloc[-5:]`

## loc 
The second paradigm for attribute selection is the one followed by the loc operator: **label-based selection**. In this paradigm, it's the data index value, not its position, which matters.

`reviews.loc[:, ['taster_name', 'taster_twitter_handle', 'points']]`

## iloc vs loc
In this case df.iloc[0:1000] will return 1000 entries, while df.loc[0:1000] return 1001 of them! To get 1000 elements using loc, you will need to go one lower and ask for df.loc[0:999]

## index

`reviews.set_index("title")`

## Conditional Selection

* `reviews.loc[reviews.country == 'Italy']`
* `reviews.loc[(reviews.country == 'Italy') & (reviews.points >= 90)]`
* `reviews.loc[(reviews.country == 'Italy') | (reviews.points >= 90)]`

**built-in conditional selectors**
* `reviews.loc[reviews.country.isin(['Italy', 'France'])]`
* `reviews.loc[reviews.price.notnull()]`


## Assigning Data
 * `reviews['critic'] = 'everyone'`