# Finding the Best Markets to Advertise In

### Introduction
We're working for an an e-learning company that offers courses on programming. Most of our courses are on web and mobile development, but we also cover many other domains, like data science, game development, etc. We want to promote our product and we'd like to invest some money in advertisement. **Our goal in this project is to find out the two best markets to advertise our product in.**

To identify the best markets to promote the courses, following becomes pre-requisite information about the potential learners, for the E-Learning company:

What people actually want to study - most preferred courses/ roles?
What is the economic investment range - how much money, learners are willing to invest in online education?
Investments in terms of time - the duration of time learners would like to spend in courses.
Demographic information such as their current residence, current employment status and its type, educational qualifications etc.

### Results

# Importing Libraries

In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Reading the Data

In [4]:
df = pd.read_csv('2017-fCC-New-Coders-Survey-Data.csv', low_memory=False)

In [10]:
df

Unnamed: 0,Age,AttendedBootcamp,BootcampFinish,BootcampLoanYesNo,BootcampName,BootcampRecommend,ChildrenNumber,CityPopulation,CodeEventConferences,CodeEventDjangoGirls,...,YouTubeFCC,YouTubeFunFunFunction,YouTubeGoogleDev,YouTubeLearnCode,YouTubeLevelUpTuts,YouTubeMIT,YouTubeMozillaHacks,YouTubeOther,YouTubeSimplilearn,YouTubeTheNewBoston
0,27.0,0.0,,,,,,more than 1 million,,,...,,,,,,,,,,
1,34.0,0.0,,,,,,"less than 100,000",,,...,1.0,,,,,,,,,
2,21.0,0.0,,,,,,more than 1 million,,,...,,,,1.0,1.0,,,,,
3,26.0,0.0,,,,,,"between 100,000 and 1 million",,,...,1.0,1.0,,,1.0,,,,,
4,20.0,0.0,,,,,,"between 100,000 and 1 million",,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18170,41.0,0.0,,,,,1.0,more than 1 million,,,...,,,,,,,,never see,,
18171,31.0,0.0,,,,,1.0,more than 1 million,,,...,1.0,,,,,,,,,
18172,39.0,0.0,,,,,3.0,more than 1 million,,,...,1.0,,,,,,,,,1.0
18173,54.0,0.0,,,,,3.0,"between 100,000 and 1 million",,,...,1.0,,,1.0,,,1.0,,,


In [7]:
df.describe()

Unnamed: 0,Age,AttendedBootcamp,BootcampFinish,BootcampLoanYesNo,BootcampRecommend,ChildrenNumber,CodeEventConferences,CodeEventDjangoGirls,CodeEventFCC,CodeEventGameJam,...,YouTubeEngineeredTruth,YouTubeFCC,YouTubeFunFunFunction,YouTubeGoogleDev,YouTubeLearnCode,YouTubeLevelUpTuts,YouTubeMIT,YouTubeMozillaHacks,YouTubeSimplilearn,YouTubeTheNewBoston
count,15367.0,17709.0,1069.0,1079.0,1073.0,2314.0,1609.0,165.0,1708.0,290.0,...,993.0,6036.0,1261.0,3539.0,2662.0,1396.0,3327.0,622.0,201.0,2960.0
mean,27.691872,0.062002,0.699719,0.305839,0.818267,1.832325,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
std,8.559239,0.241167,0.458594,0.460975,0.385805,0.972813,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
min,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
25%,22.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
50%,26.0,0.0,1.0,0.0,1.0,2.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
75%,32.0,0.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
max,90.0,1.0,1.0,1.0,1.0,9.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18175 entries, 0 to 18174
Columns: 136 entries, Age to YouTubeTheNewBoston
dtypes: float64(105), object(31)
memory usage: 18.9+ MB


In [9]:
df.isna().sum()

Age                     2808
AttendedBootcamp         466
BootcampFinish         17106
BootcampLoanYesNo      17096
BootcampName           17226
                       ...  
YouTubeMIT             14848
YouTubeMozillaHacks    17553
YouTubeOther           16961
YouTubeSimplilearn     17974
YouTubeTheNewBoston    15215
Length: 136, dtype: int64

In [12]:
round(df.isnull().sum()*100 / len(df), 2)

Age                    15.45
AttendedBootcamp        2.56
BootcampFinish         94.12
BootcampLoanYesNo      94.06
BootcampName           94.78
                       ...  
YouTubeMIT             81.69
YouTubeMozillaHacks    96.58
YouTubeOther           93.32
YouTubeSimplilearn     98.89
YouTubeTheNewBoston    83.71
Length: 136, dtype: float64

In [13]:
len(df.isna().sum() > 0)

136

In [18]:
aux = df.isna().sum()
len(aux[df.isna().sum().values > 0])

132

From the previos analysis were made different observations. The dataset is formed by **18175 entries** (rows) and **136 features** (columns). It seems that **132** of **136**  columns contain **null** values, but what may difficult our analysis scope is that there are columns which have more than **90%** of null values.

We don't have any clear documentation describing each column, affortunately most column names are rather self-explanatory, and also the raw-data folder of the dataset repository contains the initial survey questions. From this information, we can conclude that the predominant majority of columns don't represent any interest for our analysis goal because of:
<ul>
    <li><b>Irrelevance:</b> Columns like HasServedInMilitary, CityPopulation, IsEthnicMinority..etc. Are irrelevant for our analysis scope.</li>
    <li><b>Refferring to previous additional learning resources</b> Columns like Podcast..., CodeEvent..., Resource..., YouTube...etc. Which is rather relative, since one source can be extremely helpful for one person while totally useless for another.</li>
</ul>  

We conclude that the relevant columns for our analysis goal are: 
<ul>
    <li>Age</li>
    <li>AttendedBootcamp</li>
    <li>CountryCitizen</li>
    <li>CountryLive</li>
    <li>EmploymentField</li>
    <li>EmploymentStatus</li>
    <li>Gender</li>
    <li>HasChildren</li>
    <li>HasDebt</li>
    <li>HasFinancialDependents</li>
    <li>HasHomeMortgage</li>
    <li>HasStudentDebt</li>
    <li>HoursLearning</li>
    <li>Income</li>
    <li>JobRoleInterest</li>
    <li>MaritalStatus</li>
    <li>MoneyForLearning</li>
    <li>MonthsProgramming</li>
    <li>SchoolDegree</li>
    <li>SchoolMajor</li>      
</ul>