_In this kernel I am trying to explore the different available dataset and trying to find a way to merge the information effectively_ .

**Understanding Problem statement:**
>* Use data to propose specific rule modifications for the NFL that aim to reduce the occurrence of concussions during punt plays.

**As per Article-** [https://injury.research.chop.edu/blog/posts/research-first-line-defense-nfl-moves-improve-player-safety#.XAx9QGgzZPY](http://)
>* Concussions can result from a blow or jolt to the head or body that causes the brain to deform within the skull. Football helmets were designed originally to prevent skull fractures and the most serious brain injuries, but not specifically to manage the rotational forces that are an important aspect of how concussions occur.

**Data:**

* The data is provided for NFL seasons 2016 to 2017.
* Each dataset can be merged on the game, play or player level using the provided key variables. GameKey provides a unique identifier for a specific game which is unique across NFL seasons. PlayID identifies a unique play within a specified GameKey. GSISID provides a unique identifier for a player across all seasons.

In [None]:
import numpy as np 
import pandas as pd 
pd.set_option('max.columns', None)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
import missingno as msno 
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.

In [None]:
print("Total number of files we are dealing with: ",len(os.listdir("../input")))

In [None]:
# Considering 1st file in that list
data = pd.read_csv("../input/NGS-2016-reg-wk13-17.csv")

In [None]:
# Checking the different column names and their values
print("Dimension of Data Frame:",data.shape)
print("------------------------------")
print(data.columns)
print("------------------------------")
print(data.head())
print("------------------------------")
print('Unique values in {} column:'.format('Event'),data.Event.sort_values().unique()," having length", len(data.Event.sort_values().unique()))
print("------------------------------")

In [None]:
## NA Count
print("NA count: \n",data.isnull().sum())
data.isnull().sum().plot.bar(x='Columns',y='Count')

In [None]:
## checking data availability
msno.matrix(data)

In [None]:
msno.bar(data)

In [None]:
data.groupby('Event')['dis'].sum().sort_values().plot.bar(x='Event',y='Count')

In [None]:
data.groupby('Event')['PlayID','dis'].max().sort_values(by='PlayID')#.plot.bar(x='Event',y='Count')

In [None]:
import datetime
data.Time = pd.to_datetime(data.Time, format = '%Y-%m-%d %H:%M:%S.%f')

## Analyzing Player_Punt_Data.CSV

> * Player Punt Data: Player level data that specifies the traditional football position for each player. Each player is identified using his GSISID.

**Player Punt Data**
* _Player punt data assigns each player their typical football position._

> * GSISID: Unique player identification, unique across seasons (#####)
> * Position: Typical player position - not punt specific (ABC)
> * Number: Player jersey number (##)

In [None]:
player_punt = pd.read_csv('../input/player_punt_data.csv')
player_punt.head()

In [None]:
print("Dimension of Player Punt Data: ",player_punt.shape)

print("-----------------------------")

print("Unique # of entries in Number Column: ", len(player_punt.Number.unique()))

print("-----------------------------")

In [None]:
## Jersey Number Punt distribution
player_punt.groupby('Number')['Position'].count().sort_values().plot(figsize=(18,30),kind='barh')

In [None]:
## Player Role Vs Punt:
player_punt.groupby('Position')['Number'].count().sort_values().plot(figsize=(10,5),kind='bar')


## **Analyze: play_player_role_data.csv**
### **Play Player Role Data**
* Player Play Role data assigns each player a punt-specific role. These roles may differ by player between plays. This table also defines all players in each punt play. See the Appendix for a diagram of the Role definitions.

> * Season_Year: NFL Season (YYYY)
> * GameKey: Numeric game identifier, unique across seasons (#####)
> * PlayID: Numeric play identifier, not unique across games, requires Gamekey(####)
> * GSISID: Unique player identification, unique across seasons (#####)
> * Role: Punt specific player information (see diagram in appendix) (ABC)


In [None]:
play_player_role=pd.read_csv('../input/play_player_role_data.csv')
play_player_role.head()

In [None]:
print("Dimension of Play Player Role Data: ",play_player_role.shape)

print("-----------------------------")

In [None]:
play_player_role.Season_Year.unique()

In [None]:
play_player_role[play_player_role.PlayID==188]

## Video Review

The Video Review database contains play and player information for each identifiable play that was associated with a concussion. For each injured player, the Primary Exposure is the impact that is observed to be markedly more severe than any other exposure during that play and was considered to be the primary source of the concussion. In some cases, the injury producing play can be identified, but the “Primary” event (helmet to helmet, helmet to body) cannot be identified. The “Primary Impact” will be listed as Unclear if the video coverage was adequate to observe all the events experienced by the player, but the competing exposures could not be differentiated to identify a primary. For plays in which the video coverage was not sufficient to visualize the player’s exposures, the primary exposure will be listed as Indeterminate. The data provided in the video review dataset will be only those for the primary impact.

Within the video review database, the prefix “Player” indicates the concussed player and “Partner” indicates the collision partner when applicable. If both the player and partner are concussed, then each player will be listed as a player.

Season_Year: NFL Season (####)
GameKey: Numeric game identifier, unique across seasons (#####)
PlayID: Numeric play identifier, not unique across games, requires GameKey (####)
GSISID: Unique player identification, unique across seasons (#####)
Player_Activity_Derived: Player activity during primary injury causing event
Blocked: Player was blocked
Blocking: Player was blocking
Tackled: Player was tackled
Tackling: Player was tackling
Diving/Leaping: Player was diving or leaping
Other: Other activity
Turnover_Related: Identifies concussions that were related to a turnover during the play
Yes: Concussion causing event related to a turnover
No: Turnover had no relation to concussion
NA: Not applicable
Primary_Impact_Type: Categorical variable defining the impacting source that caused the concussion
Helmet-to-body: Helmet to partner's body impact
Helmet-to-ground: Helmet to ground impact
Helmet-to-helmet: Helmet to helmet impact
Indeterminate: Primary exposure could not be visualized
Unclear: Primary exposure could not be differentiated from other contacts
Unidentifiable: Injury play could not be identified
Primary_Partner_GSISID: Unique player identification, impacting player involved with primary helmet impact (not applicable for helmet to ground impacts) (#####)
Primary_Partner_Activity_Derived:
Blocked: Partner was blocked
Blocking: Partner was blocking
Tackled: Partner was tackled
Tackling: Partner was tackling
Diving/Leaping: Partner was diving or leaping
Other: Other activity
Friendly_Fire: Friendly fire occurs when the primary impact results from contact between two players on the same team
Yes: Player and partner on same team
No: Player and partner on different teams
Indeterminate: Primary exposure could not be visualized
Unclear: Primary exposure could not be differentiated from other contacts
Unidentifiable: Injury play could not be identified
NA: Not applicable, e.g. helmet to ground impact

In [None]:
video_control_data= pd.read_csv('../input/video_footage-control.csv')
video_control_data.head()

In [None]:
video_injury_data = pd.read_csv('../input/video_footage-injury.csv')
video_injury_data.head()

In [None]:
video_injury_data.PlayDescription[0]

In [None]:
video_review_data = pd.read_csv('../input/video_review.csv')
video_review_data.head()