### Pandas Data Structures

- Key building blocks
    - indexes: sequences of labeles
    - Series: 1D array with index
    - DataFrame: 2D array with Index
- Indexes
    - Immutable(like dictionary keys)
    - homogenous in data types( like numpy arrays)

### Creating a Series

In [1]:
import pandas as pd

prices = [10.70, 10.86, 10.74, 10.71, 10.79]

shares = pd.Series(prices)

shares

0    10.70
1    10.86
2    10.74
3    10.71
4    10.79
dtype: float64

### Creating an index

In [2]:
days = ['Mon', 'Tue', 'Wed', 'Thur', 'Fri']

shares = pd.Series(prices, index=days)

shares

Mon     10.70
Tue     10.86
Wed     10.74
Thur    10.71
Fri     10.79
dtype: float64

### Examning an index

In [3]:
shares.index

Index(['Mon', 'Tue', 'Wed', 'Thur', 'Fri'], dtype='object')

In [4]:
shares.index[2]

'Wed'

In [5]:
shares.index[:2]

Index(['Mon', 'Tue'], dtype='object')

In [6]:
shares.index.names

FrozenList([None])

### Modifying index name

In [7]:
shares.index.name ='weekday'
shares

weekday
Mon     10.70
Tue     10.86
Wed     10.74
Thur    10.71
Fri     10.79
dtype: float64

### Modifying index entries

In [8]:
shares.index[2] = 'Wednsday'

TypeError: Index does not support mutable operations

In [9]:
shares.index[:4] = ['Monday', 'Tuesday', 'Wednesday', 'Thursday']

TypeError: Index does not support mutable operations

### Modifying all index entries

In [10]:
shares.index =  ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
shares

Monday       10.70
Tuesday      10.86
Wednesday    10.74
Thursday     10.71
Friday       10.79
dtype: float64

### unemployment data
- 2010 us census
- cols: zip 4 vals, unemployment:%, participants

In [None]:
unenployment = pd.read_csv('Un')

In [15]:
sales = pd.read_csv('sales.csv',index_col='month')
sales.head()

Unnamed: 0_level_0,eggs,salt,spam
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Jan,47,12.0,17
Feb,110,50.0,31
Mar,221,89.0,72
Apr,77,87.0,20
May,132,,52


In [16]:
# Create the list of new indexes: new_idx
new_idx = [i.upper() for i in sales.index]

# Assign new_idx to sales.index
sales.index = new_idx

# Print the sales DataFrame
print(sales)

     eggs  salt  spam
JAN    47  12.0    17
FEB   110  50.0    31
MAR   221  89.0    72
APR    77  87.0    20
MAY   132   NaN    52
JUN   205  60.0    55


In [17]:
# Changing index name labels
# Assign the string 'MONTHS' to sales.index.name
sales.index.name = 'MONTHS'

# Print the sales DataFrame
print(sales)

# Assign the string 'PRODUCTS' to sales.columns.name 
sales.columns.name = 'PRODUCTS'

# Print the sales dataframe again
print(sales)

        eggs  salt  spam
MONTHS                  
JAN       47  12.0    17
FEB      110  50.0    31
MAR      221  89.0    72
APR       77  87.0    20
MAY      132   NaN    52
JUN      205  60.0    55
PRODUCTS  eggs  salt  spam
MONTHS                    
JAN         47  12.0    17
FEB        110  50.0    31
MAR        221  89.0    72
APR         77  87.0    20
MAY        132   NaN    52
JUN        205  60.0    55


In [18]:
# Building an index, then a DataFrame
# Generate the list of months: months
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']

# Assign months to sales.index
sales.index = months

# Print the modified sales DataFrame
print(sales)


PRODUCTS  eggs  salt  spam
Jan         47  12.0    17
Feb        110  50.0    31
Mar        221  89.0    72
Apr         77  87.0    20
May        132   NaN    52
Jun        205  60.0    55


In [19]:
!ls

01Extracting_and_transforming_data.ipynb  sales.csv
02Index_Objects_and_Labeled_Data.ipynb	  sales-feb-2015.csv
all_medalists.csv			  titanic.csv
pennsylvania2012_turnout.csv		  users.csv
pittsburgh2013.csv


In [20]:
sales

PRODUCTS,eggs,salt,spam
Jan,47,12.0,17
Feb,110,50.0,31
Mar,221,89.0,72
Apr,77,87.0,20
May,132,,52
Jun,205,60.0,55


### Extracting data with a MultiIndex
```python
#Print sales.loc[['CA', 'TX']]
print(sales.loc[['CA','TX']])

# Print sales['CA':'TX']
print(sales['CA':'TX'])
```

### Setting & sorting a MultiIndex


```python
# Set the index to be the columns ['state', 'month']: sales
sales = sales.set_index(['state','month'])

# Sort the MultiIndex: sales
sales = sales.sort_index()

# Print the sales DataFrame
print(sales)
```

### Using .loc[] with nonunique indexes

As Dhavide mentioned in the video, it is always preferable to have a meaningful index that uniquely identifies each row. Even though pandas does not require unique index values in DataFrames, it works better if the index values are indeed unique. To see an example of this, you will index your sales data by 'state' in this exercise.

As always, begin by printing the sales DataFrame in the IPython Shell and inspecting it.
```python
# Set the index to the column 'state': sales
sales = sales.set_index('state')

# Print the sales DataFrame
print(sales)

# Access the data from 'NY'
print(sales.loc['NY'])

```

### Indexing multiple levels of a MultiIndex

Looking up indexed data is fast and efficient. And you have already seen that lookups based on the outermost level of a MultiIndex work just like lookups on DataFrames that have a single-level Index.

Looking up data based on inner levels of a MultiIndex can be a bit trickier. In this exercise, you will use your sales DataFrame to do some increasingly complex lookups.

The trickiest of all these lookups are when you want to access some inner levels of the index. In this case, you need to use slice(None) in the slicing parameter for the outermost dimension(s) instead of the usual :, or use pd.IndexSlice. You can refer to the pandas documentation for more details. For example, in the video, Dhavide used the following code to extract rows from all Symbols for the dates Oct. 3rd through 4th inclusive:

```python
stocks.loc[(slice(None), slice('2016-10-03', '2016-10-04')), :]
```

Pay particular attention to the tuple

```python
(slice(None), slice('2016-10-03', '2016-10-04')).
```


```python
# Look up data for NY in month 1: NY_month1
NY_month1 = sales.loc[('NY',1)]

# Look up data for CA and TX in month 2: CA_TX_month2
CA_TX_month2 = sales.loc[(['CA','TX'], 2),:]

# Look up data for all states in month 2: all_month2
all_month2 = sales.loc[(slice(None), 2),:]

```