## Comprehensive Exam - Incident Impulse Plot Using Kingery Bulmash Data

The notebook cleans the digitized data from the 1984 Kingery Bulmash study.

### Data Sources
- file1 : The data came from digitizing the original plots from the Kingery Bulmash paper.  There were 4 sets of data digitized:
    - The Kingery Bulmash curve fit (seperate file).
    - The plot of data from reference 2
    - The plot of data from references 6 & 7
    - The plot of data from reference 8
    - The plot of data from reference 9

### Changes
- 12-17-2018 : Started project

In [40]:
import pandas as pd
from pathlib import Path
from datetime import datetime

### File Locations

In [41]:
today = datetime.today()
in_file_data_cv = Path.cwd().parents[2] / "2_data" / "kingery-bulmash_scaled_incident_impulse" / "raw" / "KB_curve.csv"
in_file_data_rf = Path.cwd().parents[2] / "2_data" / "kingery-bulmash_scaled_incident_impulse" / "raw" / "scaled_incident_impulse_data.csv"
summary_file = Path.cwd().parents[2] / "2_data" / "kingery-bulmash_scaled_incident_impulse" / "processed" / f"summary_{today:%b-%d-%Y}.pkl"

In [42]:
df_rf = pd.read_csv(in_file_data_rf)
df_rf.head()

Unnamed: 0,x,y,Reference
0,0.136,296.879,REF 9
1,0.177,192.859,REF 9
2,0.251,91.418,REF 9
3,0.378,34.135,REF 9
4,0.503,22.716,REF 9


In [43]:
df_cv = pd.read_csv(in_file_data_cv)
df_cv.head()

Unnamed: 0,x,y,Reference
0,0.136532,285.768505,kb_curve
1,0.173323,189.008419,kb_curve
2,0.207692,130.287052,kb_curve
3,0.240823,95.950375,kb_curve
4,0.270207,76.12163,kb_curve


### Column Cleanup

- Remove all leading and trailing spaces
- Rename the columns for consistency.

In [44]:
# https://stackoverflow.com/questions/30763351/removing-space-in-dataframe-python
df_cv.columns = [x.strip() for x in df_cv.columns]
df_rf.columns = [x.strip() for x in df_rf.columns]

In [45]:
cols_to_rename1 = {'x': 'scaled_distance'}
cols_to_rename2 = {'y': 'scaled_incident_impulse'}
df_cv.rename(columns=cols_to_rename1, inplace=True)
df_cv.rename(columns=cols_to_rename2, inplace=True)
df_rf.rename(columns=cols_to_rename1, inplace=True)
df_rf.rename(columns=cols_to_rename2, inplace=True)

### Clean Up Data Types

In [46]:
df_cv.dtypes

scaled_distance            float64
scaled_incident_impulse    float64
Reference                   object
dtype: object

### Data Manipulation

In [47]:
df_tot = df_cv.append(df_rf, ignore_index=True)

### Save output file into processed directory

Save a file in the processed directory that is cleaned properly. It will be read in and used later for further analysis.

Other options besides pickle include:
- feather
- msgpack
- parquet

In [48]:
df_tot.to_pickle(summary_file)