#### Reading multiple files

~~~
filenames = [...]

dataframes = [pd.read_csv(filename) for filename in filenames]
~~~

#### Using glob

~~~
from glob import glob

filenames = glob('sales*.csv')

dataframes = [pd.read_csv(filename) for filename in filenames]
~~~


### Reindexing DataFrames

#### "Indexes" vs. "Indices"

- indices: many index labels within Index data structures
- indexes: many pandas Index data structures

**Reindexing**

~~~
ordered = ['Jan', 'Apr', 'Jul', 'Oct']

w_mean2 = w_mean.reindex(ordered)

w_mean2.sort_index() # un-doing
~~~

from another df index

~~~
w_mean.reindex(w_max.index)
~~~

**Order matters!**

w_max.reindex(w_mean.index) != w_mean.reindex(w_max.index)


### Arithmetic with Series & DataFrames

#### Scalar multiplication

~~~
weather.loc['2013-07-01':'2013-07-07','PrecipitationIn'] * 2.54
~~~

#### Absolute temperature range

~~~
week1_range = weather.loc['2013-07-01':'2013-07-07',['Min TemperatureF','Max TemperatureF']]

week1_mean = weather.loc['2013-07-01':'2013-07-07','Mean TemperatureF']

week1_range.divide(week1_mean, axis='rows')
~~~

#### Percent changes

~~~
week1_mean.pct_change() * 100
~~~

#### Addition

~~~
bronze.add(silver, fill_value=0)
~~~

Chaining

~~~
bronze.add(silver,fill_value=0).add(gold,fill_value=0)
~~~

### Appending & concatenating Series

#### append()

- .append(): Series & DataFrame method
- invocation:
	- s1.append(s2)
- stacks the rows of s2 below s1
- does not adjust the index
	- .reset_index(drop=True)

#### concat()

- concat(): pandas module function
- invocation:
	- pd.concat([s1, s2, s3])
- can stack row-wise or column-wise
- does no adjust the index
	- parameter: ignore_index=True

#### concat() & .append()

- Equivalence of concat() & .append()
	- result1 = pd.concat([s1,s2,s3])
	- result2 = s1.append(s2).append(s3)
- result1 == result2 element-wise

#### Using multi-index on rows

~~~
rain1314 = pd.concat([rain2013,rain2014],keys=[2013,2014],axis=0)
~~~

#### Using multi-index on columns

~~~
rain1314 = pd.concat([rain2013,rain2014],keys=[2013,2014],axis='columns')
~~~

#### pd.concat() with dict

~~~
rain_dict = {2013: rain2013, 2014: rain2014}

rain1314 = pd.concat(rain_dict,axis='columns')
~~~

~~~

# Sort the entries of medals: medals_sorted
medals_sorted = medals.sort_index(level=0)

# Print the number of Bronze medals won by Germany
print(medals_sorted.loc[('bronze','Germany')])

# Print data about silver medals
print(medals_sorted.loc['silver'])

# Create alias for pd.IndexSlice: idx
idx = pd.IndexSlice

# Print all the data on medals won by the United Kingdom
print(medals_sorted.loc[idx[:,'United Kingdom'],:])
~~~

~~~
# Concatenate dataframes: february
february = pd.concat(dataframes,keys=['Hardware', 'Software', 'Service'],axis=1)

# Print february.info()
print(february.info())

# Assign pd.IndexSlice: idx
idx = pd.IndexSlice

# Create the slice: slice_2_8
slice_2_8 = february.loc['Feb 2, 2015':'Feb 8, 2015', idx[:, 'Company']]

# Print slice_2_8
print(slice_2_8)
~~~

~~~
# Make the list of tuples: month_list
month_list = [('january', jan), ('february', feb), ('march', mar)]

# Create an empty dictionary: month_dict
month_dict = {}

for month_name, month_data in month_list:

    # Group month_data: month_dict[month_name]
    month_dict[month_name] = month_data.groupby('Company').sum()

# Concatenate data in month_dict: sales
sales = pd.concat(month_dict)

# Print sales
print(sales)

# Print all sales by Mediacore
idx = pd.IndexSlice
print(sales.loc[idx[:, 'Mediacore'], :])
~~~

### Outer & Inner joins

In [8]:
import numpy as np
import pandas as pd

A = np.arange(8).reshape(2,4) + 0.1
B = np.arange(6).reshape(2,3) + 0.2
C = np.arange(12).reshape(3,4) + 0.3

print('A:\n',A,end='\n\n')
print('B:\n',B,end='\n\n')
print('C:\n',C,end='\n\n')

print(np.hstack([B, A]) == np.concatenate([B, A], axis=1),end='\n\n')
print(np.vstack([A, C]) == np.concatenate([A, C], axis=0),end='\n\n')

A:
 [[0.1 1.1 2.1 3.1]
 [4.1 5.1 6.1 7.1]]

B:
 [[0.2 1.2 2.2]
 [3.2 4.2 5.2]]

C:
 [[ 0.3  1.3  2.3  3.3]
 [ 4.3  5.3  6.3  7.3]
 [ 8.3  9.3 10.3 11.3]]

[[ True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True]]

[[ True  True  True  True]
 [ True  True  True  True]
 [ True  True  True  True]
 [ True  True  True  True]
 [ True  True  True  True]]



~~~
np.concatenate([A, B], axis=0) # incompatible columns

np.concatenate([A, C], axis=1) # incompatible rows
~~~

### Joins

- joining tables: combining rows of multiple tables
- Outer join
	- union of index sets (all labels, no repitition)
	- missing fields filled with NaN
- Inner join
	- intersection of index sets (only common labels)

**Concatenation & inner join**

~~~
pd.concat([population, unemployment], axis=1, join='inner')
~~~

**Concatenation & outer join**

~~~
pd.concat([population, unemployment], axis=1, join='outer') # outer: default
~~~

#### Merging on

~~~
pd.merge(bronze, gold, on=['NOC','Country'], suffixes=['_bronze','_gold'])
~~~

~~~
pd.merge(counties, cities, left_on='CITY NAME', right_on='City')
~~~

#### Merging with inner join

~~~
pd.merge(bronze, gold, on=['NOC','Country'], suffixes=['_bronze','_gold'], how='inner') # inner: default
~~~

#### Merging with left join

- keeps all rows of the left DF in the merged DF
- for rows in the left DF with matches in the right DF:
	- non-joining columns of the right DF are appended to the left DF
-for rows in the left DF with no matches in the right DF:
	- non-joining columns are filled with nulls
- DataFrame.join(df) performs a left join by default (how='left')

~~~
pd.merge(bronze, gold, on=['NOC','Country'], suffixes=['_bronze','_gold'], how='left')
~~~

#### Merging with right join

- keeps all rows of the right DF in the merged DF
- for rows in the right DF with matches in the left DF:
	- non-joining columns of the left DF are appended to the right DF
-for rows in the right DF with no matches in the left DF:
	- non-joining columns are filled with nulls

~~~
pd.merge(bronze, gold, on=['NOC','Country'], suffixes=['_bronze','_gold'], how='right')
~~~

#### Merging with outer join

~~~
pd.merge(bronze, gold, on=['NOC','Country'], suffixes=['_bronze','_gold'], how='outer')
~~~

### Which should you use?

- df1.append(df2): stacking vertically
- pd.concat([df1,df2]):
	- stacking many horizontally or vertically
	- simple inner/outer join on Indexes
- df1.join(df2): inner/outer/left/right joins on Indexes
- pd.merge([df1,df2]): many joins on multiple columns

### Ordered merges

~~~
pd.merge_ordered(hardware, software, on=['Date','Company'], suffixes=['_hardware','_software'])
~~~

- By default, performs an outer join
- Equivalent to chaining pd.merge() and .sorted_values()

**Forward-filling**

~~~
pd.merge_ordered(stocks, gdp, on='Date', fill_method='ffill')
~~~