# Phase 1 Code Challenge
This code challenge is designed to test your understanding of the Phase 1 material. It covers:

- Pandas
- Data Visualization
- Exploring Statistical Data
- Python Data Structures

*Read the instructions carefully.* Your code will need to meet detailed specifications to pass automated tests.

## Code Tests

We have provided some code tests for you to run to check that your work meets the item specifications. Passing these tests does not necessarily mean that you have gotten the item correct - there are additional hidden tests. However, if any of the tests do not pass, this tells you that your code is incorrect and needs changes to meet the specification. To determine what the issue is, read the comments in the code test cells, the error message you receive, and the item instructions.

---
## Part 1: Pandas [Suggested Time: 15 minutes]
---
In this part, you will preprocess a dataset from the video game [FIFA19](https://www.kaggle.com/karangadiya/fifa19), which contains data from the players' real-life careers.

In [None]:
# Run this cell without changes

import pandas as pd
import numpy as np
from numbers import Number
import warnings
warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt

### 1.1) Read `fifa.csv` into a pandas DataFrame named `df`

Use pandas to create a new DataFrame, called `df`, containing the data from the dataset in the file `fifa.csv` in the folder containing this notebook. 

Hint: Use the string `'./fifa.csv'` as the file reference.

In [None]:
## load data into dataframe
df = pd.read_csv('./AviationData.csv', encoding='latin-1')

In [None]:
df.info()

In [None]:
## removing Month and Day from the Event.Date column
df['Event.Date'] = df['Event.Date'].str[:-6]

In [None]:
## creating new dataframe of incidents by year
df_by_year = df['Event.Date'].value_counts()
df_by_year = df_by_year.sort_index()
df_by_year

In [None]:
## converting Event.Date column to type int
df['Event.Date'] = df['Event.Date'].astype(np.int64)

In [None]:
df.info()

In [None]:
## creating new dataframe of incidents after 2009
df_after_09 = df.loc[df['Event.Date'] >= 2009]

In [None]:
df_after_09.head()

In [None]:
## dropping columns which are not needed for EDA
df_after_09 = df_after_09.drop(columns=['Accident.Number', 'Location', "Latitude", 'Longitude', 'Airport.Code', 'Airport.Name', 'Registration.Number',
'Schedule', 'Air.carrier', 'Report.Status', 'Publication.Date'])

In [None]:
df_after_09.info()

In [None]:
## converting all rows in column 'Make' to lower case strings
df_after_09['Make'] = df.Make.astype(str).str.lower()


In [None]:
## converting all rows in column 'Make' to objects
df_after_09['Make'] = df_after_09['Make'].astype(object)

In [None]:
## Boeing is listed in several different formats in the data i.e 'The Boeing Company'. This code checks all rows in 'Make' for substring
## 'boeing' and changes it the value to standard 'boeing'

df_after_09['Make'].loc[df_after_09['Make'].str.contains('boeing')] = 'boeing'


In [None]:

def normalize_company_names(df, column_name, company_name):
    '''Takes in df dataframe, checks every value in column_name for substring company_name
        If substring company_name exists, the value of the row is overwritten to company_name.
        Then returns the new df dataframe'''
    df[column_name].loc[df[column_name].str.contains(company_name)] = company_name
    return df


In [None]:
## Checking function by normalizing all companies with 'piper'
normalize_company_names(df_after_09, 'Make', 'piper')

In [None]:
## Normalizing the company names of the top airline manufacturers
normalize_company_names(df_after_09, 'Make', 'boeing')
normalize_company_names(df_after_09, 'Make', 'airbus')
normalize_company_names(df_after_09, 'Make', 'cessna')
normalize_company_names(df_after_09, 'Make', 'beech')
normalize_company_names(df_after_09, 'Make', 'cirrus')


In [None]:
## Looking at the top 20 companies by number of incidents
df_after_09['Make'].value_counts()[0:20]