#**NFL Combine 2022**

#**Importing Necessary Python Modules**

Python incorporates a variety of open source add-ins called **modules** that enable us to be able to

In [2]:
import pandas as pd
import numpy as np
import plotly.express as px
from IPython.display import Image
import warnings
warnings.simplefilter('ignore', FutureWarning)

#**Context**

National Invitational Camp (NIC), more commonly known as the NFL Scouting Combine, began in 1982 when National Football Scouting, Inc. first conducted a camp for its member NFL clubs in Tampa, Florida. The key purpose then, same as it is today, was to ascertain medical information on the top draft eligible prospects in college football. The inaugural NIC was attended by a total of 163 players and established a foundation for future expansion.

As football and the art of evaluating players has evolved, so has the NFL Scouting Combine. While medical examinations remain the number one priority of the event, athletes will also participate in a variety of psychological and physical tests, as well as, formal and informal interviews with top executives, coaches and scouts from all 32 NFL teams. NIC is the ultimate four day job interview for the top college football players eligible for the upcoming NFL Draft.

In [3]:
# Replace 'image_url' with the URL of the image you want to display
image_url = 'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQNohK9gczwXkLpwPmn_SL5mlzpwHp6rMDR8w&usqp=CAU'

# Display the image
Image(url=image_url)


#**About the Dataset**
This dataset contains 133 rows corresponding to a random sample of drafted players. A total of 9 variables are provided as listed below:



**Variables**

| Column name                 | Description                                                |
|-----------------------------|------------------------------------------------------------|
| Player                       | Player ID, which is the player’s name                         |
| Pos                  | Position the player plays               |
| School                       | College the player attended          |
| Ht              | Player height in inches                         |
| Wt                  | Player weight in lbs
| 40yds                |Time to run the 40-yard dash, in seconds|
|Vertical         | Vertical jump height in inches
|Broad Jump       |Horizontal distance covered in inches (aka long jump)
|Drafted (tm/rnd/yr)     | Team, round, and year the player was drafted




*Attribution:  FiveThirtyEight.com*

We can view a snippet of the data by first importing it directly from the url below[link text](https://).

**Data**

In [4]:
#Check the file path for any errors
file_path = "https://raw.githubusercontent.com/ksuaray/LAEP_S24/NFL-Combine/NFLCombine.csv"
df = pd.read_csv(file_path)


Next, we can display the data by typing the name of the DataFrame. To ensure we can see all columns, we'll use the *pd.set_option* method.

In [5]:
# Set display options to show all columns
pd.set_option('display.max_columns', None)
df

Unnamed: 0,Player,Pos,School,Ht,Wt,40yd,Vertical,Broad Jump,Drafted (tm/rnd/yr)
0,Myjai Sanders,EDGE,Cincinnati,77,228,4.67,33.0,120,Arizona Cardinals / 3rd / 100th pick / 2022
1,Keaontay Ingram,RB,USC,73,221,4.53,34.5,122,Arizona Cardinals / 6th / 201st pick / 2022
2,Jesse Luketa,LB,Penn St.,75,253,4.89,37.5,114,Arizona Cardinals / 7th / 256th pick / 2022
3,Marquis Hayes,OG,Oklahoma,77,318,5.30,23.5,102,Arizona Cardinals / 7th / 257th pick / 2022
4,Troy Andersen,LB,Montana St.,76,243,4.42,36.0,128,Atlanta Falcons / 2nd / 58th pick / 2022
...,...,...,...,...,...,...,...,...,...
128,Jahan Dotson,WR,Penn St.,71,178,4.43,36.0,121,Washington Commanders / 1st / 16th pick / 2022
129,Brian Robinson,RB,Alabama,74,225,4.53,30.0,119,Washington Commanders / 3rd / 98th pick / 2022
130,Percy Butler,S,Louisiana,73,194,4.36,31.5,123,Washington Commanders / 4th / 113th pick / 2022
131,Cole Turner,TE,Nevada,78,246,4.76,27.0,120,Washington Commanders / 5th / 149th pick / 2022


#**ASSIGNMENT 1 - Descriptive Statistics: Graphical and Numerical Summary**

**INSTRUCTIONS**
Use SPSS to analyze the data set and complete each of the following. As appropriate, copy the SPSS output and paste it in the correct part below. For problems that require a written response, type the answer below!

##**QUESTION 1**
Determine whether the variables below are qualitative or quantitative. If they are quantitative, specify whether they are continuous or discrete.

| Variable         | Qual or Quant | Dis, Con, or Neither |
|------------------|---------------|----------------------|
| **School**              |   |             |
| **Wt**           |   |                |
| **Vertical**           |    |         |


##**QUESTION 2**

Construct a frequency table, relative frequency table, and bar chart to describe the distribution of the Pos variable. State any fact that jumps out to you:

In [6]:
#Frequency table
mc = df['Pos'] #SOLUTION
freq_table = pd.value_counts(mc) 
freq_table

WR      20
RB      17
OG      17
LB      15
OT      15
EDGE    10
S        9
DE       6
DT       5
C        5
CB       5
QB       4
TE       4
P        1
Name: Pos, dtype: int64

In [7]:
len(freq_table) > 0

True

In [8]:
# HIDDEN
len(freq_table) == 14

True

In [9]:
# END TESTS

In [10]:
# END QUESTION

In [11]:
#Relative frequency table
freq_table/len(df) #SOLUTION

WR      0.150376
RB      0.127820
OG      0.127820
LB      0.112782
OT      0.112782
EDGE    0.075188
S       0.067669
DE      0.045113
DT      0.037594
C       0.037594
CB      0.037594
QB      0.030075
TE      0.030075
P       0.007519
Name: Pos, dtype: float64

In [12]:
dfrf = pd.DataFrame(freq_table)
fig = px.bar(x=dfrf.index, y=dfrf['Pos'], barmode='group',
             title='Frequency Distribution Bar Chart')
fig.show()

## **QUESTIONS 3-6**

For questions 3-6: Find your variable based on your last name and use that variable when answering questions #3 to #6.  

| Last Name | Variable                  |
|-----------|---------------------------|
| A-F       | Vertical    |
| G-M       | BroadJump               |
| N-S       | 40yd |
| T-Z       | Wt            |


###**QUESTION 3**

Construct a histogram for your variable. Use Number of Intervals = 12.

In [13]:
#Histogram of Wt
fig = px.histogram(x=df['Wt'],nbins = 12)
fig.show()

In [14]:
#Histogram of 40yd
fig = px.histogram(x=df['40yd'],nbins = 12)
fig.show()

In [15]:
#Histogram of Vertical
fig = px.histogram(x=df['Vertical'],nbins = 12)
fig.show()

In [16]:
#Histogram of Broad Jump
fig = px.histogram(x=df['Broad Jump'],nbins = 12)
fig.show()

###**QUESTION 4**

Construct a boxplot for your variable.  

In [17]:
#Boxplot of Wt
px.box(x=df['Wt'])

In [18]:
#Boxplot of 40yd
px.box(x=df['40yd'])

In [19]:
#Boxplot of Vertical
px.box(x=df['Vertical'])

In [20]:
#Boxplot of Broad Jump
px.box(x=df['Broad Jump'])

###**QUESTION 5**

Calculate the following summary statistics for your variable: minimum, maximum, mean, median, standard deviation, Q1, and Q3. Paste the output below.

In [21]:
df['Wt'].describe()

count    133.000000
mean     249.473684
std       49.181725
min      170.000000
25%      208.000000
50%      239.000000
75%      307.000000
max      341.000000
Name: Wt, dtype: float64

In [22]:
df['40yd'].describe()

count    133.000000
mean       4.711955
std        0.313297
min        4.280000
25%        4.460000
50%        4.590000
75%        4.950000
max        5.410000
Name: 40yd, dtype: float64

In [23]:
df['Vertical'].describe()

count    133.000000
mean      32.684211
std        4.545008
min       20.500000
25%       29.500000
50%       33.000000
75%       36.000000
max       42.000000
Name: Vertical, dtype: float64

In [24]:
df['Broad Jump'].describe()

count    133.000000
mean     118.541353
std        8.723454
min       99.000000
25%      111.000000
50%      121.000000
75%      125.000000
max      136.000000
Name: Broad Jump, dtype: float64

###**QUESTION 6**

Use information from questions #3, #4, and #5 to describe your variable in terms of shape, center, spread, and outliers. Interpret your findings.

THE DISTRIBUTION OF Wt IS MULTI-MODAL. THE MEDIAN IS 239 lbs AND THE IQR IS 99.5. THERE ARE NO OUTLIERS.

THE DISTRIBUTION OF 40yd IS SKEWED RIGHT. THE MEDIAN IS 4.59 seconds AND THE IQR IS 0.505. THERE ARE NO OUTLIERS.

THE DISTRIBUTION OF Vertical IS SYMMETRIC. THE MEAN IS 31.684 INCHES AND THE STANDARD DEVIATION IS 4.545 THERE ARE NO OUTLIERS.

THE DISTRIBUTION OF BroadJump IS SKEWED LEFT. THE MEDIAN IS 121 INCHES AND THE IQR IS 14. THERE ARE 14 OUTLIERS IN THE RIGHT TAIL

##**QUESTION 7**

7.	Calculate the average Weight and 40-yard dash times for linebackers (Pos = LB). Do the same for Runningbacks (Pos = RB). Compare the results.





##**QUESTION 8**

Generate a paragraph of at least 100 words to address one of the following questions:

### **QUESTION 8a**

Discuss how analyzing your chosen data set using statistical methods could help you become better prepared for future courses in your major?

This helped me become a better problem solver when it comes to statistics and coding problems. Especially since my major is Informationn Systems.

### **QUESTION 8b**

Discuss how analyzing your chosen data set using statistical methods could be instrumental in becoming better prepared for your future career?