# Part 2: Linear Regression


In this part, we will be working with a dataset scraped by [Shubham Maurya](https://www.kaggle.com/mauryashubham/linear-regression-to-predict-market-value/data), which collects facts about players in the English Premier League as of 2017. His original goal was to establish if there was a relationship between a player's popularity and his market value, as estimated by transfermrkt.com.

**Your goal is to fit a model able to predict a player's market value.**

## The dataset

The dataset contains the following information:

<table >
<tr>
<th><b>Field</b></th>
<th><b>Description</b></th>
</tr>
<tr><td> name </td><td> Name of the player </td></tr>
<tr><td> club </td><td> Club of the player </td></tr>
<tr><td> age </td><td> Age of the player </td></tr>
<tr><td> position </td><td>  The usual position on the pitch </td><tr> 
<tr><td>position_cat</td><td> 1 for attackers, 2 for midfielders, 3 for defenders, 4 for goalkeepers</td></tr>
<tr><td>market_value</td><td> As on transfermrkt.com on July 20th, 2017</td></tr>
<tr><td>page_views</td><td> Average daily Wikipedia page views from September 1, 2016 to May 1, 2017</td></tr>
<tr><td>fpl_value</td><td> Value in Fantasy Premier League as on July 20th, 2017</td></tr>
<tr><td>fpl_sel</td><td> % of FPL players who have selected that player in their team</td></tr>
<tr><td>fpl_points</td><td> FPL points accumulated over the previous season</td></tr>
<tr><td>region</td><td> 1 for England, 2 for EU, 3 for Americas, 4 for Rest of World</td></tr>
<tr><td>nationality</td><td> Player's nationality</td></tr>
<tr><td>new_foreign</td><td> Whether a new signing from a different league, for 2017/18 (till 20th July)</td></tr>
<tr><td>age_cat</td><td> a categorical version of the Age feature</td></tr>
<tr><td>club_id</td><td> a numerical version of the Club feature</td></tr>
<tr><td>big_club</td><td> Whether one of the Top 6 clubs</td></tr>
<tr><td>new_signing</td><td> Whether a new signing for 2017/18 (till 20th July)</td></tr>
<table>


## Exercise 1: Exploring the data
The first step you need to do is to explore your data.

We will start wil the necessary imports. In this exercise, we will be working with the library `pandas`. If you are not familiar with it, it is recommended that you follow the introductory exercises that can be found in the course's github repository.

In [11]:
using DataFrames, CSV

We will now proceed to read the dataset:

In [14]:
league_df = DataFrame(CSV.File("data/football_data.csv")) #Reads a CSV file

Unnamed: 0_level_0,name,club,age,position,position_cat,market_value
Unnamed: 0_level_1,String31,String31,Int64,String3,Int64,Float64
1,Alexis Sanchez,Arsenal,28,LW,1,65.0
2,Mesut Ozil,Arsenal,28,AM,1,50.0
3,Petr Cech,Arsenal,35,GK,4,7.0
4,Theo Walcott,Arsenal,28,RW,1,20.0
5,Laurent Koscielny,Arsenal,31,CB,3,22.0
6,Hector Bellerin,Arsenal,22,RB,3,30.0
7,Olivier Giroud,Arsenal,30,CF,1,22.0
8,Nacho Monreal,Arsenal,31,LB,3,13.0
9,Shkodran Mustafi,Arsenal,25,CB,3,30.0
10,Alex Iwobi,Arsenal,21,LW,1,10.0


### Task 1.1: Using DataFrames for data exploration
Use the method `first(name_dataframe, N)` (N is the number of entries) to look at the first instances of the dataframe. 

Then, use the method `name_dataframe.describe(include='all')` to generate descriptive statistics that summarize each field of the dataframe. 

Finally, print the result of `name_dataframe.dtypes`, in this way you print out the data types associated to each of the fields in the table 

In [18]:
#Your code for first
first(league_df,10)

Unnamed: 0_level_0,name,club,age,position,position_cat,market_value,page_views
Unnamed: 0_level_1,String31,String31,Int64,String3,Int64,Float64,Int64
1,Alexis Sanchez,Arsenal,28,LW,1,65.0,4329
2,Mesut Ozil,Arsenal,28,AM,1,50.0,4395
3,Petr Cech,Arsenal,35,GK,4,7.0,1529
4,Theo Walcott,Arsenal,28,RW,1,20.0,2393
5,Laurent Koscielny,Arsenal,31,CB,3,22.0,912
6,Hector Bellerin,Arsenal,22,RB,3,30.0,1675
7,Olivier Giroud,Arsenal,30,CF,1,22.0,2230
8,Nacho Monreal,Arsenal,31,LB,3,13.0,555
9,Shkodran Mustafi,Arsenal,25,CB,3,30.0,1877
10,Alex Iwobi,Arsenal,21,LW,1,10.0,1812


# References
- [ ] [DataFrames.jl](https://dataframes.juliadata.org/)
- [ ] [Importing and Exporting Data (I/O)](https://dataframes.juliadata.org/stable/man/importing_and_exporting)