# Splittling lists into rows if you have Pandas before 0.25.0

In which they introduced the `.explode()` method.

In [1]:
import pandas as pd

## Splitting lists into columns

This wasn't part of the original Tidy Data paper, but it's an example I run into all the time and I haven't seen it documented very many places.

The data is in a sub-folder called `data`. The `read_excel()` function **will read the first sheet in the workbook by default if you don't specify another**

In [2]:
ps = pd.read_excel('./data/PeopleStates.xlsx')
ps

Unnamed: 0,name,states
0,Bobby,"Wyoming,Michigan"
1,Sue,"Wisconsin,Nevada,California"
2,Tamika,"Florida,Washington"
3,Cale,South Dakota
4,Iris,"Washington,Oregon,California"


#### Splitting strings on a delimiter character

Here we do a "splitting" operation on the column to split what is currently a single string containing commas, into a list of the items between the commas.

*Note, you will end up with a single column of lists if don't put `expand=True`, which denotes that you're intending to "expand the dimensionality" of the data set.*

*Notice, also, that the DataFrame will expand to enough columns to accomodate the list with the most elements, unless you specify a limit, and lists without enough elements will have `None` in the extra columns.*

In [6]:
split_states = ps["states"].str.split(',', expand=True)
split_states

Unnamed: 0,0,1,2
0,Wyoming,Michigan,
1,Wisconsin,Nevada,California
2,Florida,Washington,
3,South Dakota,,
4,Washington,Oregon,California


#### Concatenation – `concat()`

Pandas will use the Index to align rows of the original `names` Series and the `psplit` DataFrame that are being concatenated. 

- `axis=0` is down the rows
- `axis=1` is across the columns.

Let's put the expanded states and the names back together into one table.

In [7]:
pexp = pd.concat([ps.name, split_states], axis=1)
pexp

Unnamed: 0,name,0,1,2
0,Bobby,Wyoming,Michigan,
1,Sue,Wisconsin,Nevada,California
2,Tamika,Florida,Washington,
3,Cale,South Dakota,,
4,Iris,Washington,Oregon,California


## Melting data across columns into rows

- id_vars will be repeated and not un-pivoted
- all others will be melted down into a single column (values)
- with the column names as a separate column (variables)

When we don't specify a `var_name=` for `melt()`, it will default to "variable"

In [8]:
ptidy = pd.melt(pexp, id_vars=['name'], value_name='state')
ptidy

Unnamed: 0,name,variable,state
0,Bobby,0,Wyoming
1,Sue,0,Wisconsin
2,Tamika,0,Florida
3,Cale,0,South Dakota
4,Iris,0,Washington
5,Bobby,1,Michigan
6,Sue,1,Nevada
7,Tamika,1,Washington
8,Cale,1,
9,Iris,1,Oregon


#### Drop columns

In this case we don't need the "variable" column. There are a couple ways we can get rid of, or *drop*, unwanted columns. We can

- Specify a list of column names to select only certain columns to keep, dropping others that aren't needed (we'll cover a strange point about this method in a second)
- Use the `drop()` method

Since the `drop()` method can drop either rows or columns from a DataFrame, we need to either 

- tell Pandas what values to drop, plus the axis along which to drop (0=rows, 1=columns)
- or we can explicitly say `columns=` or `rows=` **<- I think this way is more straightforward**


In [9]:
pnamestate = ptidy.drop(columns=['variable'])
pnamestate.head()

Unnamed: 0,name,state
0,Bobby,Wyoming
1,Sue,Wisconsin
2,Tamika,Florida
3,Cale,South Dakota
4,Iris,Washington


## Dropping the Null / NaN / NA state rows

#### *NOTE: "inplace"*

- Most functions create a copy of the DataFrame instead of changing the original
- Many methods include an "inplace" argument, so it won't make a copy
- **Be careful! You're writing over your data in place!**

#### `dropna()` to drop nulls

- Defaults to dropping any row that has a null/None in **any** column
- You can specify a subset of colunns to test instead.

In [13]:
pnamestate.dropna(inplace=True)
pnamestate

Unnamed: 0,name,state
0,Bobby,Wyoming
1,Sue,Wisconsin
2,Tamika,Florida
3,Cale,South Dakota
4,Iris,Washington
5,Bobby,Michigan
6,Sue,Nevada
7,Tamika,Washington
9,Iris,Oregon
11,Sue,California
