## Exploring fandom data

In [13]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split

In [2]:
df = pd.read_csv('sports_survey_2021.csv')

From the marketing data frame, columns that may be of interest:
-----
- 'S1' (gender), 
- 'S2' (age),
- 'S4r1','S4r2','S4r3','S4r4','S4r5','S4r98','S4r99': ethnicity
- 'D4': Household income
    - 1: under 25,000
    - 2: 25,000-49,999
    - 3 50,000-74,999
    - 4 75,000-99,999
    - 5 100,000-149,999
    - 6 150,000 or more
- 'D5': employment status
    - 1: Full time
    - 2: Part time 
    - 3: Not Employed
    - 4: Homemaker
    - 5: Retired
    - 6: Prefer to not say
- 'D6': Educational attainment
    - 1: Some high school
    - 2: High school grad
    - 3: Vocational/trade school
    - 4: Some college
    - 5: College grad
    - 6: Post-graduate degree
    - 7: prefer not to say
- 'S12r3': how many hours you spend watching in a typical week.
- 'S13': how many hours watching the following sports programming
    - 'S13r1' (live games
    - 'S13r2' (highlight shows)
    - 'S13r3' (docuseries)
    - 'S13r4' (analysis/debate shows) 
- 'S15' how much of a fan, 0-not a fan up to 6-obsessed fan.
    - 'S15r1' (NFL)
    - 'S15r2' (NBA)
    - 'S15r3' (College Football)
    - 'S15r4' (College Basketball)
    - 'S15r5' (MLB)
    - 'S15r6' (NHL)
    - 'S15r7'(Inter. Soccer)
    - 'S15r8' (MLS)
    - 'S15r9' (CombatSports)
    - 'S15r10' (nascar)
    - 'S15r11' (F1)  
 - 'VL1': acitivties have you done in the past year in conjunction with the sports you follow: 
    - 'VL1r1'( went to a game)
    - 'VL1r2' (watched at a sports bar)
    - 'VL1r3' (watched at a friends home)
    - 'VL1r4' (placed a bet at casino, etc)
    - 'VL1r5' (listened to sports talk radio)
    - 'VL1r6'(called into sports talk radio)
    - 'VL1r7' (wore teams jersey)
    - 'VL1r8'(watched at home)
    - 'VL1r9' (talked about games in person/phone)
    - 'VL1r10'(talked about games online
    - 'VL1r11' (played in fantasy)
    - 'VL1r12'(purchased multi-game tickets)
    - 'VL1r13' (bet in a group pool)
    - 'VL1r14' (played daily fantasy)
    - 'VL1r15' (none) 
- 'VL2': interest in betting on sports, 1: not at all, 6: very intersted.
 
 - 'VL3': engagement in the following sports. 1: watching no matter who is playing. 2: watching only if team is playing
     - 'VL3r1': NFL
     - 'VL3r2': NBA
     - 'VL3r3': CFB
     - 'VL3r4': CBB
     - 'VL3r5': MLB
     - 'VL3r6': NHL
 - 'VL4': more or less of a fan than used to be, 1: More of a fan, 2: less than a fan
     - 'VL4r1': NFL
     - 'VL4r2': NBA
     - 'VL4r3': CFB
     - 'VL4r4': CBB
     - 'VL4r5': MLB
     - 'VL4r6': NHL
- 'TEAM6' Assuming games are available, how often will you watch the NFL games for the following teams. List teams with values 1: Every week, 2: most weeks, 3: some weeks, 4: only if its a big game,'TEAM6r1-32' are NFL teams.
- 'TEAM7': important factots driving interest in NFL games. 1: Not at all up to 6: Very important.
     - 'TEAM7r1': a game with favorite team.
     - 'TEAM7r2': a game with a player I am interested in. 
     - 'TEAM7r3': a game with a division rivel of my team. 
     - 'TEAM7r4': a game between teams with winning records. 
     - 'TEAM7r5': a classic NFL rivarly game (i.e. Packers vs Bears)
     - 'TEAM7r6': a game with star offensive player
     - 'TEAM7r7': a game with star defensive player
     - 'TEAM7r8': a game featuring a top pick from prev. draft. 
     - 'TEAM7r9': a game I bet on
     - 'TEAM7r10': a game with significance for my fantasy team
- 'TEAM 8': describing why you watch your team. 1: does not describe at all up to 6: Desribes completly.
     - 'TEAM8r1': hometown team
     - 'TEAM8r2': family watches this team
     - 'TEAM8r3': friends watch this team
     - 'TEAM8r4': a good team
     - 'TEAM8r5': always on TV
     - 'TEAM8r6': a team with start offensive player
     - 'TEAM8r7': a team with start defensive player
     - 'TEAM8r8': featuring top pick from prev draft. 
     - 'TEAM8r9': a team I regularly bet on. 
     - 'TEAM8r10': I have fantasy players on the team
     - 'TEAM8r11': the city i moved to is/was the local team
     - 'TEAM8r12': team I most identify with
     - 'TEAM8r13': they are entertaining
- 'NFL1': what do you subscribe to. (0: no, 1: yes)
     - 'NFL1r1': NFL sunday ticket
     - 'NFL1r2': NFL redzone 
     - 'NFL1r3': NFL network 
     - 'NFL1r4': None
- 'NFL2': how frequently do you watch games in the following time slots, 1: Every week, 2: most weeks, 3: some weeks, 4:only if team is playing, 5: never
     - 'NFL2r1': Thursday night
     - 'NFL2r2': Sunday Early game
     - 'NFL2r3': Sunday Late game
     - 'NFL2r4': Sunday night football
     - 'NFL2r5': Monday night football
- 'NFL3': how often will you bet on NFL games this season, 1: I will not bet, 2: one or two weeks, 3: some weeks, 4: most weeks, 5: every week
- 'NFL4': how many fantasy leagues do you play in, 1: none, 2: 1, 3: 2, 4: 3, 5: 4 or more. 



In [55]:
fan_train, fan_test = train_test_split(df, 
                                      shuffle = True,
                                      random_state = 555,
                                      test_size = .2)

### Explore data here