# More Pandas, Part 2

We can get the csv of the Austin Animal Center data [here](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes/9t4d-g238). Click on 'Export' and then a 'CSV' link will appear.

In [None]:
import numpy as np
import pandas as pd
animals = pd.read_csv('/Users/gdamico/Downloads/Austin_Animal_Center_Outcomes.csv')
animals.head()

### 3. Reshaping a DataFrame

#### .pivot()

Those of you familiar with Excel have probably used Pivot Tables. Pandas has a similar functionality.

In [None]:
animals.pivot(values='Age upon Outcome', columns='Animal Type').head()

Grouping by two different columns can be very helpful, but it has the unsavory side effect of creating a two-level index. This can be a good time to use `.pivot()` or `.pivot_table()`.

In [None]:
animals.groupby(by=['Outcome Type', 'Sex upon Outcome']).agg(len)

In [None]:
animals.pivot_table(index='Outcome Type', columns='Sex upon Outcome', aggfunc=len)

### 4. Methods for Combining DataFrames: .join(), .merge(), .concat(), .melt()

#### .join()

In [None]:
toy1 = pd.DataFrame([[63, 142], [33, 47]], columns=['age', 'HP'])
toy2 = pd.DataFrame([[63, 100], [33, 200]], columns=['age', 'MP'])

In [None]:
toy1

In [None]:
toy2

In [None]:
toy1.set_index('age').join(toy2.set_index('age'))

For more on this method, check out the [doc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html)!

#### .merge()

In [None]:
ds_chars = pd.read_csv('ds_chars.csv', index_col=0)
ds_chars

In [None]:
states = pd.read_csv('states.csv', index_col=0)
states

In [None]:
ds_chars.merge(states, left_on='home_state', right_on='state', how='inner')

#### pd.concat()

This method takes a *list* of pandas objects as arguments.

N.B. The cell below may produce a **Deprecation Warning**.

In [None]:
ds_full = pd.concat([ds_chars, states])
ds_full

`pd.concat()`––and many other pandas operations––make use of an `axis` parameter. For this particular method I need to specify whether I want to concatenate the DataFrames *row-wise* (`axis=0`) or *column-wise* (`axis=1`). The default is `axis=0`, so let's override that!

#### pd.melt()

Melting removes the structure from your DataFrame and puts the data in a 'variable' and 'value' format.

In [None]:
pd.melt(ds_full)

[Here](https://towardsdatascience.com/transforming-data-in-python-with-pandas-melt-854221daf507) is a use case for `pd.melt()`.

### 5. Making Use of Categories: One-Hot Encoding

Pandas has a one-hot encoder called `get_dummies()`, which is good for exploratory data analysis (EDA).

This might be good to use if we're in the **data-understanding** stage (Stage 2) of our CRISP-DM process.

We can call it on a DataFrame as a whole or on a Series (column).

In [None]:
pd.get_dummies(animals['Animal Type'])

If however we're in a later stage of the process and we're interested, say, in preparing a data pipeline, `pandas.get_dummies()` will prove inferior to other tools.

In practice, we will **not** use `pandas.get_dummies()`. The library Scikit-Learn (`sklearn`, included with your Anaconda installation) has a `OneHotEncoder` class that creates an object that persists. This makes it much more apt for production environments, and so it's good to get in the habit of using it.

Ultimately, we will use **many** tools from sklearn.

In [None]:
from sklearn.preprocessing import OneHotEncoder

In [None]:
ohe = OneHotEncoder()

In [None]:
ohe.fit(animals[['Animal Type']])

Now that the `OneHotEncoder` has been fitted to our data, it has newly available attributes and methods. In particular, it has access to the different categories that we're replacing:

In [None]:
ohe.get_feature_names()

We'll have much more to say about `sklearn` syntax and about Python's object structure. But let's now transform our data to see what the new table looks like:

In [None]:
ohe.transform(animals[['Animal Type']])

For the sake of saving storage space, the return is a **sparse matrix**, but we can "re-inflate it if we want to see it in tabular form:

In [None]:
types_encoded = ohe.transform(animals[['Animal Type']]).todense()
types_encoded

Let's put it into a DataFrame:

In [None]:
pd.DataFrame(types_encoded, columns=ohe.get_feature_names()).head()