# Real-world Data Wrangling

In this project, you will apply the skills you acquired in the course to gather and wrangle real-world data with two datasets of your choice.

You will retrieve and extract the data, assess the data programmatically and visually, accross elements of data quality and structure, and implement a cleaning strategy for the data. You will then store the updated data into your selected database/data store, combine the data, and answer a research question with the datasets.

Throughout the process, you are expected to:

1. Explain your decisions towards methods used for gathering, assessing, cleaning, storing, and answering the research question
2. Write code comments so your code is more readable

## 1. Gather data

In this section, you will extract data using two different data gathering methods and combine the data. Use at least two different types of data-gathering methods.

### **1.1.** Problem Statement
In 2-4 sentences, explain the kind of problem you want to look at and the datasets you will be wrangling for this project.

*FILL IN:*

### **1.2.** Gather at least two datasets using two different data gathering methods

List of data gathering methods:

- Download data manually
- Programmatically downloading files
- Gather data by accessing APIs
- Gather and extract data from HTML files using BeautifulSoup
- Extract data from a SQL database

Each dataset must have at least two variables, and have greater than 500 data samples within each dataset.

For each dataset, briefly describe why you picked the dataset and the gathering method (2-3 full sentences), including the names and significance of the variables in the dataset. Show your work (e.g., if using an API to download the data, please include a snippet of your code). 

Load the dataset programmtically into this notebook.

#### **Dataset 1**

Type: Relational Databases

 Method: The data was gathered using the MySQL Database and was connected to the jupiter's notebook. The source comes from 'Stack Overflow.com', which is a trusted vetted website which is used by millions of developers each day. This data contains a survey that was completed by thousands of developers. This make this a valuable source to answer our question. 

Dataset variables:

*   Age of participant
*   Main Branch

In [4]:
pip install PyMySQL

Note: you may need to restart the kernel to use updated packages.


In [5]:
#FILL IN 1st data gathering and loading method
import pandas as pd 

from sqlalchemy import create_engine # This needs to be created for connection to database.

import pymysql # Package for working with MySQL, which it the database of choose to use.


In [6]:
user = "root"
password = "Lowercase1"
host =  "localhost"
db_name = "Udacity_Dataset"
port = 3306

# The following  will allow connections to MySQL Database
sqlEngine      = create_engine(f'mysql+pymysql://{user}:{password}@{host}/{db_name}',pool_recycle=port)
dbConnection   = sqlEngine.connect()

In [7]:
#Gathering all columns form the table
sql = """select *
from survey_result"""

In [8]:
#Read data from database and input content to dataframe
df = pd.read_sql(sql,dbConnection, index_col = "ResponseId")
df.head(5)

Unnamed: 0_level_0,MainBranch,Age,Employment,RemoteWork,Check,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,TechDoc,...,JobSatPoints_6,JobSatPoints_7,JobSatPoints_8,JobSatPoints_9,JobSatPoints_10,JobSatPoints_11,SurveyLength,SurveyEase,ConvertedCompYearly,JobSat
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,I am a developer by profession,Under 18 years old,"Employed, full-time",Remote,Apples,Hobby,Primary/elementary school,Books / Physical media,,,...,,,,,,,,,,
2,I am a developer by profession,35-44 years old,"Employed, full-time",Remote,Apples,Hobby;Contribute to open-source projects;Other...,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Colleague;On the job tr...,Technical documentation;Blogs;Books;Written Tu...,API document(s) and/or SDK document(s);User gu...,...,0.0,0.0,0.0,0.0,0.0,0.0,,,,
3,I am a developer by profession,45-54 years old,"Employed, full-time",Remote,Apples,Hobby;Contribute to open-source projects;Other...,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Books / Physical media;Colleague;On the job tr...,Technical documentation;Blogs;Books;Written Tu...,API document(s) and/or SDK document(s);User gu...,...,,,,,,,Appropriate in length,Easy,,
4,I am learning to code,18-24 years old,"Student, full-time",,Apples,,Some college/university study without earning ...,"Other online resources (e.g., videos, blogs, f...",Stack Overflow;How-to videos;Interactive tutorial,,...,,,,,,,Too long,Easy,,
5,I am a developer by profession,18-24 years old,"Student, full-time",,Apples,,"Secondary school (e.g. American high school, G...","Other online resources (e.g., videos, blogs, f...",Technical documentation;Blogs;Written Tutorial...,API document(s) and/or SDK document(s);User gu...,...,,,,,,,Too short,Easy,,


#### Dataset 2

Type: CSV File.

Method:  The data was gathered by manually downloading the data. The source comes from 'FREECODECAMP.com', which is one of the largest online coding websites to date. The data contains survey which was conducted by the website and contains enough responses to able to analyze to answer the question.
Dataset variables:

*  Hours by week on coding.
*   How many months learning programming.

In [9]:
#FILL IN 2nd data gathering and loading method
df_two = pd.read_csv('2021 New Coder Survey.csv', encoding='utf-8')
df_two.head(5)

  df_two = pd.read_csv('2021 New Coder Survey.csv', encoding='utf-8')


Unnamed: 0,Timestamp,1. What is your biggest reason for learning to code?,2. What methods have you used to learn about coding? Please select all that apply.,3. Which online learning resources have you found helpful? Please select all that apply.,"4. If you have attended in-person coding-related events before, which ones have you found helpful? Please select all that apply.","5. If you have listened to coding-related podcasts before, which ones have you found helpful? Please select all that apply.","6. If you have watched coding-related YouTube videos before, which channels have you found helpful? Please select all that apply.",7. About how many hours do you spend learning each week?,8. About how many months have you been programming?,"9. Aside from university tuition, about how much money have you spent on learning to code so far (in US Dollars)?",...,45. Please tell us how satisfied you are with each of these following aspects of your present job [Job security],45. Please tell us how satisfied you are with each of these following aspects of your present job [Work-life balance],45. Please tell us how satisfied you are with each of these following aspects of your present job [Professional growth or leadership opportunities],45. Please tell us how satisfied you are with each of these following aspects of your present job [Workplace/company culture],45. Please tell us how satisfied you are with each of these following aspects of your present job [Diverse and inclusive work environment],45. Please tell us how satisfied you are with each of these following aspects of your present job [Weekly workload],46. About how many minutes does it take you to get to work each day?,47. Have you served in your country's military before?,48. Do you currently receive disability benefits from your government?,49. Do you have high speed internet at your home?
0,7/1/2021 10:10:23,To succeed in current career,"Online resources, Books, In-person bootcamps, ...","freeCodeCamp, Mozilla Developer Network (MDN),...","conferences, workshops, Meetup.com events",The Changelog,"CS Dojo, freeCodeCamp",4.0,120,,...,Somewhat satisfied,Somewhat dissatisfied,I do not know,Somewhat satisfied,Somewhat satisfied,Very dissatisfied,I work from home,No,No,Yes
1,7/1/2021 10:31:01,To change careers,"Online resources, Books, Online bootcamps","freeCodeCamp, Mozilla Developer Network (MDN),...",I haven't attended any in-person coding-relate...,"The Changelog, Code Newbie Podcast","Adrian Twarog, Code with Ania Kubów, Coder Cod...",10.0,6,30.0,...,Very dissatisfied,Somewhat satisfied,Somewhat dissatisfied,Somewhat dissatisfied,Somewhat satisfied,Somewhat satisfied,15 to 29 minutes,No,Yes,Yes
2,7/1/2021 10:42:31,To change careers,"Online resources, Books, Hackathons, Meetup.co...","freeCodeCamp, Mozilla Developer Network (MDN),...",Meetup.com events,I haven't listened to any podcasts,"AmigosCode, Dev Ed, freeCodeCamp, Kevin Powell...",30.0,48,300.0,...,Not Applicable,Not Applicable,Not Applicable,Not Applicable,Not Applicable,Not Applicable,I am not working,No,No,Yes
3,7/1/2021 11:06:43,As a hobby,"Online resources, Books","freeCodeCamp, Mozilla Developer Network (MDN),...",I haven't attended any in-person coding-relate...,"Darknet Diaries, Real Python Podcast","freeCodeCamp, Traversy Media",,36,0.0,...,,,,,,,I am not working,No,No,No
4,7/1/2021 11:14:31,To start your first career,"Online resources, Books, Online bootcamps","freeCodeCamp, Stack Overflow, Coursera, Udemy",I haven't attended any in-person coding-relate...,Talk Python to Me,"freeCodeCamp, The Net Ninja, Traversy Media",2.0,24,5000.0,...,Somewhat dissatisfied,Somewhat satisfied,Somewhat dissatisfied,Somewhat satisfied,Somewhat dissatisfied,Somewhat dissatisfied,45 to 60 minutes,No,No,Yes


Optional data storing step: You may save your raw dataset files to the local data store before moving to the next step.

In [10]:
#Optional: store the raw data in your local data store

## 2. Assess data

Assess the data according to data quality and tidiness metrics using the report below.

List **two** data quality issues and **two** tidiness issues. Assess each data issue visually **and** programmatically, then briefly describe the issue you find.  **Make sure you include justifications for the methods you use for the assessment.**

### Quality Issue 1:

In [11]:
#FILL IN - Inspecting the dataframe visually
df_two.sample(10)

Unnamed: 0,Timestamp,1. What is your biggest reason for learning to code?,2. What methods have you used to learn about coding? Please select all that apply.,3. Which online learning resources have you found helpful? Please select all that apply.,"4. If you have attended in-person coding-related events before, which ones have you found helpful? Please select all that apply.","5. If you have listened to coding-related podcasts before, which ones have you found helpful? Please select all that apply.","6. If you have watched coding-related YouTube videos before, which channels have you found helpful? Please select all that apply.",7. About how many hours do you spend learning each week?,8. About how many months have you been programming?,"9. Aside from university tuition, about how much money have you spent on learning to code so far (in US Dollars)?",...,45. Please tell us how satisfied you are with each of these following aspects of your present job [Job security],45. Please tell us how satisfied you are with each of these following aspects of your present job [Work-life balance],45. Please tell us how satisfied you are with each of these following aspects of your present job [Professional growth or leadership opportunities],45. Please tell us how satisfied you are with each of these following aspects of your present job [Workplace/company culture],45. Please tell us how satisfied you are with each of these following aspects of your present job [Diverse and inclusive work environment],45. Please tell us how satisfied you are with each of these following aspects of your present job [Weekly workload],46. About how many minutes does it take you to get to work each day?,47. Have you served in your country's military before?,48. Do you currently receive disability benefits from your government?,49. Do you have high speed internet at your home?
10820,8/31/2021 8:40:24,To start a business or to freelance,Online resources,"Mozilla Developer Network (MDN), Stack Overflow",hackathons,I haven't listened to any podcasts,I haven't watched any coding-related YouTube v...,2.0,60.0,80.0,...,Very satisfied,Very satisfied,Very satisfied,Very satisfied,Somewhat dissatisfied,Somewhat satisfied,15 to 29 minutes,No,No,Yes
15994,9/20/2021 19:00:10,To succeed in current career,"Online resources, Books, Meetup.com events, Co...","freeCodeCamp, Stack Overflow, Coursera, Codeca...","conferences, Meetup.com events",Learn To Code With Me,I haven't watched any coding-related YouTube v...,10.0,7.0,200.0,...,Very satisfied,Very satisfied,Somewhat satisfied,Somewhat satisfied,Somewhat satisfied,Very satisfied,Less than 15 minutes,No,No,Yes
349,7/9/2021 1:06:09,To start your first career,Online resources,"freeCodeCamp, Mozilla Developer Network (MDN),...",I haven't attended any in-person coding-relate...,I haven't listened to any podcasts,"Coder Coder, Web Dev Simplified",10.0,8.0,0.0,...,,,,,,,I am not working,No,No,Yes
12086,9/5/2021 3:39:00,To start your first career,"Online resources, College Education",Stack Overflow,I haven't attended any in-person coding-relate...,I haven't listened to any podcasts,I haven't watched any coding-related YouTube v...,21.0,36.0,0.0,...,Somewhat satisfied,Somewhat satisfied,Very satisfied,Very satisfied,Very satisfied,Very satisfied,30 to 44 minutes,No,No,Yes
4289,8/9/2021 18:00:43,To succeed in current career,"Online resources, friends","freeCodeCamp, w3schools.com",I haven't attended any in-person coding-relate...,I haven't listened to any podcasts,I haven't watched any coding-related YouTube v...,3.0,0.0,0.0,...,,,,,,,More than 60 minutes,No,No,Yes
5452,8/12/2021 18:42:00,To succeed in current career,"Online resources, Books, Online bootcamps","Mozilla Developer Network (MDN), Stack Overflo...",I haven't attended any in-person coding-relate...,Software Engineering Daily,"Coding Train, The Net Ninja, Traversy Media",20.0,144.0,1500.0,...,Very satisfied,Somewhat dissatisfied,Somewhat dissatisfied,Somewhat satisfied,Somewhat satisfied,Somewhat dissatisfied,I work from home,Yes,No,Yes
15993,9/20/2021 18:52:58,To change careers,Online resources,"Khan Academy, Codecademy, YouTube",,,CS Dojo,,,,...,Somewhat satisfied,Somewhat satisfied,Very dissatisfied,Somewhat dissatisfied,Somewhat satisfied,Somewhat satisfied,I work from home,No,No,No
14558,9/14/2021 18:55:16,To start your first career,Online resources,"freeCodeCamp, Codecademy",I haven't attended any in-person coding-relate...,I haven't listened to any podcasts,I haven't watched any coding-related YouTube v...,2.0,1.0,0.0,...,Somewhat satisfied,Very satisfied,Very dissatisfied,Somewhat dissatisfied,Very dissatisfied,Somewhat satisfied,15 to 29 minutes,No,No,Yes
2752,8/5/2021 23:56:15,i feel like it,this my first time,freeCodeCamp,I haven't attended any in-person coding-relate...,I haven't listened to any podcasts,I haven't watched any coding-related YouTube v...,0.0,0.0,0.0,...,I do not know,I do not know,I do not know,I do not know,I do not know,I do not know,I am not working,No,No,Yes
17902,9/28/2021 9:52:12,To start your first career,"Online resources, Books, Online bootcamps","freeCodeCamp, Stack Overflow, Pluralsight, Uda...",I haven't attended any in-person coding-relate...,I haven't listened to any podcasts,"freeCodeCamp, Programming With Mosh, thenewbos...",5.0,5.0,0.0,...,Somewhat dissatisfied,Somewhat dissatisfied,Somewhat satisfied,Somewhat satisfied,Somewhat satisfied,Somewhat satisfied,45 to 60 minutes,No,No,Yes


In [12]:
#FILL IN - Inspecting the dataframe programmatically
#This dataset has 18,126 responses. 
df_two.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18126 entries, 0 to 18125
Data columns (total 63 columns):
 #   Column                                                                                                                                                                      Non-Null Count  Dtype  
---  ------                                                                                                                                                                      --------------  -----  
 0   Timestamp                                                                                                                                                                   18126 non-null  object 
 1   1. What is your biggest reason for learning to code?                                                                                                                        17991 non-null  object 
 2   2. What methods have you used to learn about coding? Please select all that apply.

Issue and justification:

On first inspection of the dataset provided, it can be observed that in columns 2 and 3 have a high count of responses.This shows that for these prompt the users we allowed to answer on a free-text response or multi-choice response. Although having responses with high count responses might help uniqueness, it decreases accuracy as the dataset is not standardized.

### Tidiness Issue 1:

In [13]:
#FILL IN - Inspecting the dataframe visually

df_two.sample(10)

Unnamed: 0,Timestamp,1. What is your biggest reason for learning to code?,2. What methods have you used to learn about coding? Please select all that apply.,3. Which online learning resources have you found helpful? Please select all that apply.,"4. If you have attended in-person coding-related events before, which ones have you found helpful? Please select all that apply.","5. If you have listened to coding-related podcasts before, which ones have you found helpful? Please select all that apply.","6. If you have watched coding-related YouTube videos before, which channels have you found helpful? Please select all that apply.",7. About how many hours do you spend learning each week?,8. About how many months have you been programming?,"9. Aside from university tuition, about how much money have you spent on learning to code so far (in US Dollars)?",...,45. Please tell us how satisfied you are with each of these following aspects of your present job [Job security],45. Please tell us how satisfied you are with each of these following aspects of your present job [Work-life balance],45. Please tell us how satisfied you are with each of these following aspects of your present job [Professional growth or leadership opportunities],45. Please tell us how satisfied you are with each of these following aspects of your present job [Workplace/company culture],45. Please tell us how satisfied you are with each of these following aspects of your present job [Diverse and inclusive work environment],45. Please tell us how satisfied you are with each of these following aspects of your present job [Weekly workload],46. About how many minutes does it take you to get to work each day?,47. Have you served in your country's military before?,48. Do you currently receive disability benefits from your government?,49. Do you have high speed internet at your home?
14320,9/13/2021 23:22:40,To start your first career,Online resources,"freeCodeCamp, Coursera, Khan Academy, Udemy, L...",I haven't attended any in-person coding-relate...,none,,20.0,0.0,0.0,...,Very dissatisfied,Somewhat dissatisfied,Very dissatisfied,Very dissatisfied,Somewhat dissatisfied,Somewhat satisfied,15 to 29 minutes,Yes,No,Yes
13806,9/11/2021 17:38:00,To start your first career,"Online resources, Books, Online bootcamps","freeCodeCamp, Stack Overflow, Coursera, Khan A...",I haven't attended any in-person coding-relate...,Devs Cansados,"RocketSeat, Curso em Vídeo, Boson Treinamentos...",12.0,24.0,50.0,...,Somewhat dissatisfied,Somewhat satisfied,Somewhat satisfied,Somewhat satisfied,Somewhat satisfied,Somewhat satisfied,More than 60 minutes,No,No,No
16078,9/21/2021 3:28:20,To succeed in current career,"Online resources, Books","freeCodeCamp, Khan Academy",I haven't attended any in-person coding-relate...,I haven't listened to any podcasts,azadchaiwala,10.0,2.0,30.0,...,Somewhat satisfied,Somewhat dissatisfied,Somewhat dissatisfied,Somewhat dissatisfied,Somewhat dissatisfied,Very satisfied,Less than 15 minutes,No,No,Yes
7619,8/19/2021 16:52:19,To succeed in current career,Online resources,freeCodeCamp,I haven't attended any in-person coding-relate...,I haven't listened to any podcasts,freeCodeCamp,10.0,1.0,0.0,...,Very dissatisfied,Very dissatisfied,Somewhat satisfied,Very dissatisfied,Somewhat satisfied,I do not know,Less than 15 minutes,No,No,No
13592,9/10/2021 18:28:07,To change careers,"Online resources, Books","freeCodeCamp, Stack Overflow, Udemy, HackerRank",I haven't attended any in-person coding-relate...,Code Newbie Podcast,"Dev Ed, freeCodeCamp, Kevin Powell",7.0,24.0,50.0,...,Very satisfied,Very satisfied,Somewhat satisfied,Very satisfied,Very satisfied,Very satisfied,15 to 29 minutes,No,No,Yes
8161,8/21/2021 23:35:09,To start your first career,"Online resources, Books","freeCodeCamp, Mozilla Developer Network (MDN),...",I haven't attended any in-person coding-relate...,I haven't listened to any podcasts,"Code with Ania Kubów, freeCodeCamp, Tech with Tim",,,,...,,,,,,,I am not working,No,No,No
15198,9/17/2021 9:35:36,To change careers,Online resources,"freeCodeCamp, Mozilla Developer Network (MDN),...",I haven't attended any in-person coding-relate...,I haven't listened to any podcasts,Kevin Powell,30.0,2.0,16.0,...,Very dissatisfied,Somewhat satisfied,Somewhat dissatisfied,Very satisfied,Very satisfied,Very satisfied,30 to 44 minutes,No,No,Yes
10322,8/29/2021 14:33:38,To create art or entertainment,"Online resources, Online bootcamps, Workshops","freeCodeCamp, Stack Overflow, HackerRank",I haven't attended any in-person coding-relate...,I haven't listened to any podcasts,freeCodeCamp,1.0,2.0,6500.0,...,,,,,,,,No,No,No
6007,8/14/2021 16:54:33,To succeed in current career,"Online resources, Books","freeCodeCamp, Stack Overflow",I haven't attended any in-person coding-relate...,Talk Python to Me,"Code with Ania Kubów, Dev Ed, freeCodeCamp, Ja...",12.0,48.0,0.0,...,I do not know,I do not know,I do not know,I do not know,I do not know,I do not know,I work from home,No,No,No
10281,8/29/2021 12:10:55,To meet school requirements,"Online resources, Books",,I haven't attended any in-person coding-relate...,I haven't listened to any podcasts,,2.0,100.0,1600.0,...,,,,,,,I am not working,No,No,Yes


In [14]:
#FILL IN - Inspecting the dataframe programmatically
df_two.columns

Index(['Timestamp', '1. What is your biggest reason for learning to code?',
       '2. What methods have you used to learn about coding? Please select all that apply.',
       '3. Which online learning resources have you found helpful? Please select all that apply.',
       '4. If you have attended in-person coding-related events before, which ones have you found helpful? Please select all that apply.',
       '5. If you have listened to coding-related podcasts before, which ones have you found helpful? Please select all that apply.',
       '6. If you have watched coding-related YouTube videos before, which channels have you found helpful? Please select all that apply.',
       '7. About how many hours do you spend learning each week?',
       '8. About how many months have you been programming?',
       '9. Aside from university tuition, about how much money have you spent on learning to code so far (in US Dollars)?',
       '10. Are you already employed in a software development job

Issue and justification: 

As discussed in the course, tidiness of the data refers to how well structured the data is. We do this so that the data is easy to manipulate and analyze. Tidiness has a couple of considerations. These considerations are, each variable forms a column, each observation forms a row, and lastly each type of observational unit forms a table.

We will drop the column 'timestamp' and all columns after '.8' as it does not appropiately answer the problem we are trying to answer. 

### Quality Issue 2:

In [15]:
#FILL IN - Inspecting the dataframe visually
df.sample(10)


Unnamed: 0_level_0,MainBranch,Age,Employment,RemoteWork,Check,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,TechDoc,...,JobSatPoints_6,JobSatPoints_7,JobSatPoints_8,JobSatPoints_9,JobSatPoints_10,JobSatPoints_11,SurveyLength,SurveyEase,ConvertedCompYearly,JobSat
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
33420,I am a developer by profession,35-44 years old,"Not employed, but looking for work",,Apples,,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Books / Physical media;Colleague;On the job tr...,Technical documentation;Blogs;Books;Written Tu...,API document(s) and/or SDK document(s);User gu...,...,,,,,,,Appropriate in length,Easy,,
8142,I am a developer by profession,18-24 years old,"Student, full-time;Employed, part-time",Remote,Apples,Hobby;School or academic work,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Other online resources ...,Technical documentation;Books;Written Tutorial...,API document(s) and/or SDK document(s),...,60.0,0.0,0.0,0.0,0.0,0.0,Appropriate in length,Neither easy nor difficult,24703.0,8.0
26899,I am a developer by profession,35-44 years old,"Not employed, but looking for work",,Apples,,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Colleague;On the job tr...,,,...,,,,,,,Too short,Neither easy nor difficult,,
28539,I am a developer by profession,35-44 years old,"Employed, full-time",Remote,Apples,Freelance/contract work,Some college/university study without earning ...,On the job training;Other online resources (e....,Written Tutorials;Stack Overflow;Social Media;...,,...,,,,,,,Too long,Neither easy nor difficult,,
44711,I am a developer by profession,25-34 years old,"Employed, full-time;Student, full-time;Indepen...",Remote,Apples,Hobby;School or academic work;Professional dev...,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Books / Physical media;Other online resources ...,Technical documentation;Blogs;Books;Written Tu...,API document(s) and/or SDK document(s);User gu...,...,15.0,5.0,10.0,10.0,10.0,0.0,Appropriate in length,Easy,,7.0
913,I am a developer by profession,25-34 years old,"Employed, full-time",In-person,Apples,Hobby,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Colleague;Other online ...,Technical documentation;Blogs;Books;Written Tu...,API document(s) and/or SDK document(s);User gu...,...,40.0,80.0,40.0,80.0,20.0,0.0,Too long,Easy,1889.0,5.0
35660,I am a developer by profession,18-24 years old,"Employed, full-time;Student, full-time",Remote,Apples,School or academic work,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Colleague;Other online ...,Technical documentation;Books;Written Tutorial...,API document(s) and/or SDK document(s);User gu...,...,,,,,,,Appropriate in length,Easy,2017.0,
20775,I am learning to code,18-24 years old,"Student, full-time",,Apples,,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Other online resources (e.g., videos, blogs, f...",Blogs;Books;Written Tutorials;Stack Overflow;I...,,...,,,,,,,Appropriate in length,Neither easy nor difficult,,
51354,I am a developer by profession,55-64 years old,"Employed, full-time;Independent contractor, fr...",Remote,Apples,Hobby;Contribute to open-source projects;Schoo...,Some college/university study without earning ...,Books / Physical media;Colleague;On the job tr...,Technical documentation;Blogs;Books;Written Tu...,API document(s) and/or SDK document(s);User gu...,...,0.0,0.0,0.0,0.0,0.0,0.0,,Neither easy nor difficult,,7.0
61251,I code primarily as a hobby,25-34 years old,"Employed, part-time",In-person,Apples,,,,,,...,,,,,,,,,,


In [45]:
#FILL IN - Inspecting the dataframe programmatically
df.isnull()

Unnamed: 0_level_0,MainBranch,Age,Employment,RemoteWork,Check,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,TechDoc,...,JobSatPoints_6,JobSatPoints_7,JobSatPoints_8,JobSatPoints_9,JobSatPoints_10,JobSatPoints_11,SurveyLength,SurveyEase,ConvertedCompYearly,JobSat
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
5,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65433,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
65434,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
65435,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
65436,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


Issue and justification: 

On first observation, we could see the some columns like 'CodingActivities' and 'LearnCode' have multiple selections by delimeter(;). We  need to check for consistency and possibly transform these into separate columns.

With this in mind, observing the data it can be observed that in multiple columns variables forms columns that contain multiple variables. With this in mind  it must be cleaned and fixed in order to have data that is better to work with. We do this by creating serparate columns, such having a column for people that learn on an online learning platform  versus those who learn by physical media like books.


### Tidiness Issue 2: 

In [17]:
#FILL IN - Inspecting the dataframe visually
df.head(10)

Unnamed: 0_level_0,MainBranch,Age,Employment,RemoteWork,Check,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,TechDoc,...,JobSatPoints_6,JobSatPoints_7,JobSatPoints_8,JobSatPoints_9,JobSatPoints_10,JobSatPoints_11,SurveyLength,SurveyEase,ConvertedCompYearly,JobSat
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,I am a developer by profession,Under 18 years old,"Employed, full-time",Remote,Apples,Hobby,Primary/elementary school,Books / Physical media,,,...,,,,,,,,,,
2,I am a developer by profession,35-44 years old,"Employed, full-time",Remote,Apples,Hobby;Contribute to open-source projects;Other...,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Colleague;On the job tr...,Technical documentation;Blogs;Books;Written Tu...,API document(s) and/or SDK document(s);User gu...,...,0.0,0.0,0.0,0.0,0.0,0.0,,,,
3,I am a developer by profession,45-54 years old,"Employed, full-time",Remote,Apples,Hobby;Contribute to open-source projects;Other...,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Books / Physical media;Colleague;On the job tr...,Technical documentation;Blogs;Books;Written Tu...,API document(s) and/or SDK document(s);User gu...,...,,,,,,,Appropriate in length,Easy,,
4,I am learning to code,18-24 years old,"Student, full-time",,Apples,,Some college/university study without earning ...,"Other online resources (e.g., videos, blogs, f...",Stack Overflow;How-to videos;Interactive tutorial,,...,,,,,,,Too long,Easy,,
5,I am a developer by profession,18-24 years old,"Student, full-time",,Apples,,"Secondary school (e.g. American high school, G...","Other online resources (e.g., videos, blogs, f...",Technical documentation;Blogs;Written Tutorial...,API document(s) and/or SDK document(s);User gu...,...,,,,,,,Too short,Easy,,
6,I code primarily as a hobby,Under 18 years old,"Student, full-time",,Apples,,Primary/elementary school,"School (i.e., University, College, etc);Online...",,,...,,,,,,,Appropriate in length,Easy,,
7,"I am not primarily a developer, but I write co...",35-44 years old,"Employed, full-time",Remote,Apples,I don’t code outside of work,"Professional degree (JD, MD, Ph.D, Ed.D, etc.)","Other online resources (e.g., videos, blogs, f...",Technical documentation;Stack Overflow;Written...,,...,,,,,,,Too long,Neither easy nor difficult,,
8,I am learning to code,18-24 years old,"Student, full-time;Not employed, but looking f...",,Apples,,"Secondary school (e.g. American high school, G...","Other online resources (e.g., videos, blogs, f...",Technical documentation;Video-based Online Cou...,First-party knowledge base,...,,,,,,,Appropriate in length,Difficult,,
9,I code primarily as a hobby,45-54 years old,"Employed, full-time",In-person,Apples,Hobby,"Professional degree (JD, MD, Ph.D, Ed.D, etc.)",Books / Physical media;Other online resources ...,Stack Overflow;Written-based Online Courses,,...,,,,,,,Appropriate in length,Neither easy nor difficult,,
10,I am a developer by profession,35-44 years old,"Independent contractor, freelancer, or self-em...",Remote,Apples,Bootstrapping a business,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",On the job training;Other online resources (e....,Technical documentation;Blogs;Written Tutorial...,Traditional public search engine;AI-powered se...,...,,,,,,,Too long,Easy,,


In [18]:
#FILL IN - Inspecting the dataframe programmatically
df.info

<bound method DataFrame.info of                                 MainBranch                 Age  \
ResponseId                                                       
1           I am a developer by profession  Under 18 years old   
2           I am a developer by profession     35-44 years old   
3           I am a developer by profession     45-54 years old   
4                    I am learning to code     18-24 years old   
5           I am a developer by profession     18-24 years old   
...                                    ...                 ...   
65433       I am a developer by profession     18-24 years old   
65434       I am a developer by profession     25-34 years old   
65435       I am a developer by profession     25-34 years old   
65436       I am a developer by profession     18-24 years old   
65437          I code primarily as a hobby     18-24 years old   

                     Employment                            RemoteWork   Check  \
ResponseId                  

Issue and justification: 

On futher analyset there is many redundant columns of data. The dataset has columns named JobSatPoints_6,JobSatPoints_7, etc. Now only does that make the data more difficult to work with, it is not relevant to the problem he are trying to solve. Therefore, I will choose to delete all the data after LearnCodeOnline column.


## 3. Clean data
Clean the data to solve the 4 issues corresponding to data quality and tidiness found in the assessing step. **Make sure you include justifications for your cleaning decisions.**

After the cleaning for each issue, please use **either** the visually or programatical method to validate the cleaning was succesful.

At this stage, you are also expected to remove variables that are unnecessary for your analysis and combine your datasets. Depending on your datasets, you may choose to perform variable combination and elimination before or after the cleaning stage. Your dataset must have **at least** 4 variables after combining the data.

In [19]:
# FILL IN - Make copies of the datasets to ensure the raw dataframes 
# are not impacted
dataframe = df.copy()
dataframe_two =df_two.copy()

For cohesion purposes, I will reorganize the cells so that data quality and tidiness are united together.
Cleaning datasets will be worked on a dataset at a time to limit confusion.

In [20]:
dataframe.head(10)

Unnamed: 0_level_0,MainBranch,Age,Employment,RemoteWork,Check,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,TechDoc,...,JobSatPoints_6,JobSatPoints_7,JobSatPoints_8,JobSatPoints_9,JobSatPoints_10,JobSatPoints_11,SurveyLength,SurveyEase,ConvertedCompYearly,JobSat
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,I am a developer by profession,Under 18 years old,"Employed, full-time",Remote,Apples,Hobby,Primary/elementary school,Books / Physical media,,,...,,,,,,,,,,
2,I am a developer by profession,35-44 years old,"Employed, full-time",Remote,Apples,Hobby;Contribute to open-source projects;Other...,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Colleague;On the job tr...,Technical documentation;Blogs;Books;Written Tu...,API document(s) and/or SDK document(s);User gu...,...,0.0,0.0,0.0,0.0,0.0,0.0,,,,
3,I am a developer by profession,45-54 years old,"Employed, full-time",Remote,Apples,Hobby;Contribute to open-source projects;Other...,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Books / Physical media;Colleague;On the job tr...,Technical documentation;Blogs;Books;Written Tu...,API document(s) and/or SDK document(s);User gu...,...,,,,,,,Appropriate in length,Easy,,
4,I am learning to code,18-24 years old,"Student, full-time",,Apples,,Some college/university study without earning ...,"Other online resources (e.g., videos, blogs, f...",Stack Overflow;How-to videos;Interactive tutorial,,...,,,,,,,Too long,Easy,,
5,I am a developer by profession,18-24 years old,"Student, full-time",,Apples,,"Secondary school (e.g. American high school, G...","Other online resources (e.g., videos, blogs, f...",Technical documentation;Blogs;Written Tutorial...,API document(s) and/or SDK document(s);User gu...,...,,,,,,,Too short,Easy,,
6,I code primarily as a hobby,Under 18 years old,"Student, full-time",,Apples,,Primary/elementary school,"School (i.e., University, College, etc);Online...",,,...,,,,,,,Appropriate in length,Easy,,
7,"I am not primarily a developer, but I write co...",35-44 years old,"Employed, full-time",Remote,Apples,I don’t code outside of work,"Professional degree (JD, MD, Ph.D, Ed.D, etc.)","Other online resources (e.g., videos, blogs, f...",Technical documentation;Stack Overflow;Written...,,...,,,,,,,Too long,Neither easy nor difficult,,
8,I am learning to code,18-24 years old,"Student, full-time;Not employed, but looking f...",,Apples,,"Secondary school (e.g. American high school, G...","Other online resources (e.g., videos, blogs, f...",Technical documentation;Video-based Online Cou...,First-party knowledge base,...,,,,,,,Appropriate in length,Difficult,,
9,I code primarily as a hobby,45-54 years old,"Employed, full-time",In-person,Apples,Hobby,"Professional degree (JD, MD, Ph.D, Ed.D, etc.)",Books / Physical media;Other online resources ...,Stack Overflow;Written-based Online Courses,,...,,,,,,,Appropriate in length,Neither easy nor difficult,,
10,I am a developer by profession,35-44 years old,"Independent contractor, freelancer, or self-em...",Remote,Apples,Bootstrapping a business,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",On the job training;Other online resources (e....,Technical documentation;Blogs;Written Tutorial...,Traditional public search engine;AI-powered se...,...,,,,,,,Too long,Easy,,


### **Quality Issue 2: FILL IN**

In [27]:
#FILL IN - Apply the cleaning strategy

#Getting a deep understanding of the LearnCoding dataset, to see appropiate seperation strategies.
#I would like to separate this column into two different columns.
#One data set for learning code on physical means to learning code i.e. books or physical media.
#Another for learning code onn online mean to learning code. 
#Not only will this enchance the quality of the data, but also the tidiness rules since there is different observational units.  
#Split text values into a list of texts by delimeter.

dataframe['LearnCode'].str.split(';').explode().unique()


array(['Books / Physical media', 'Colleague', 'On the job training',
       'Other online resources (e.g., videos, blogs, forum, online community)',
       'School (i.e., University, College, etc)',
       'Online Courses or Certification', 'Coding Bootcamp',
       'Friend or family member', 'Other (please specify):', 'NA'],
      dtype=object)

In [43]:
#This shows that different values were selected by the same user to answer the  prompt.
dataframe_edited = dataframe['LearnCode'].str.split(';', expand=True)
dataframe_edited

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,Books / Physical media,,,,,,,,
2,Books / Physical media,Colleague,On the job training,"Other online resources (e.g., videos, blogs, f...",,,,,
3,Books / Physical media,Colleague,On the job training,"Other online resources (e.g., videos, blogs, f...","School (i.e., University, College, etc)",,,,
4,"Other online resources (e.g., videos, blogs, f...","School (i.e., University, College, etc)",Online Courses or Certification,,,,,,
5,"Other online resources (e.g., videos, blogs, f...",,,,,,,,
...,...,...,...,...,...,...,...,...,...
65433,On the job training,"School (i.e., University, College, etc)",,,,,,,
65434,,,,,,,,,
65435,"Other online resources (e.g., videos, blogs, f...",,,,,,,,
65436,On the job training,"Other online resources (e.g., videos, blogs, f...",,,,,,,


In [55]:
dataframe['LearnCode'].str.split(';').explode()


ResponseId
1                                   Books / Physical media
2                                   Books / Physical media
2                                                Colleague
2                                      On the job training
2        Other online resources (e.g., videos, blogs, f...
                               ...                        
65434                                                   NA
65435    Other online resources (e.g., videos, blogs, f...
65436                                  On the job training
65436    Other online resources (e.g., videos, blogs, f...
65437                                                   NA
Name: LearnCode, Length: 203006, dtype: object

In [52]:
#Add resulting data from split into two seperate columns.
#dataframe_edited.loc(loc=1, column= ''['In_person_Learning'] = ['Books / Physical media', 'On the job training', 'School (i.e., University, College, etc', 'Colleague', 'Friend or family member']

In [None]:
#FILL IN - Validate the cleaning was successful

Justification: *FILL IN*

### **Tidiness Issue 2: FILL IN**

In [1]:
#FILL IN - Apply the cleaning strategy

In [2]:
#FILL IN - Validate the cleaning was successful

Justification: *FILL IN*

Now, I will enchance the quality of the data and improve the tidiness of the second dataset.

In [63]:
dataframe_two.head(10)

Unnamed: 0,Timestamp,1. What is your biggest reason for learning to code?,2. What methods have you used to learn about coding? Please select all that apply.,3. Which online learning resources have you found helpful? Please select all that apply.,"4. If you have attended in-person coding-related events before, which ones have you found helpful? Please select all that apply.","5. If you have listened to coding-related podcasts before, which ones have you found helpful? Please select all that apply.","6. If you have watched coding-related YouTube videos before, which channels have you found helpful? Please select all that apply.",7. About how many hours do you spend learning each week?,8. About how many months have you been programming?,"9. Aside from university tuition, about how much money have you spent on learning to code so far (in US Dollars)?",...,45. Please tell us how satisfied you are with each of these following aspects of your present job [Job security],45. Please tell us how satisfied you are with each of these following aspects of your present job [Work-life balance],45. Please tell us how satisfied you are with each of these following aspects of your present job [Professional growth or leadership opportunities],45. Please tell us how satisfied you are with each of these following aspects of your present job [Workplace/company culture],45. Please tell us how satisfied you are with each of these following aspects of your present job [Diverse and inclusive work environment],45. Please tell us how satisfied you are with each of these following aspects of your present job [Weekly workload],46. About how many minutes does it take you to get to work each day?,47. Have you served in your country's military before?,48. Do you currently receive disability benefits from your government?,49. Do you have high speed internet at your home?
0,7/1/2021 10:10:23,To succeed in current career,"Online resources, Books, In-person bootcamps, ...","freeCodeCamp, Mozilla Developer Network (MDN),...","conferences, workshops, Meetup.com events",The Changelog,"CS Dojo, freeCodeCamp",4.0,120,,...,Somewhat satisfied,Somewhat dissatisfied,I do not know,Somewhat satisfied,Somewhat satisfied,Very dissatisfied,I work from home,No,No,Yes
1,7/1/2021 10:31:01,To change careers,"Online resources, Books, Online bootcamps","freeCodeCamp, Mozilla Developer Network (MDN),...",I haven't attended any in-person coding-relate...,"The Changelog, Code Newbie Podcast","Adrian Twarog, Code with Ania Kubów, Coder Cod...",10.0,6,30.0,...,Very dissatisfied,Somewhat satisfied,Somewhat dissatisfied,Somewhat dissatisfied,Somewhat satisfied,Somewhat satisfied,15 to 29 minutes,No,Yes,Yes
2,7/1/2021 10:42:31,To change careers,"Online resources, Books, Hackathons, Meetup.co...","freeCodeCamp, Mozilla Developer Network (MDN),...",Meetup.com events,I haven't listened to any podcasts,"AmigosCode, Dev Ed, freeCodeCamp, Kevin Powell...",30.0,48,300.0,...,Not Applicable,Not Applicable,Not Applicable,Not Applicable,Not Applicable,Not Applicable,I am not working,No,No,Yes
3,7/1/2021 11:06:43,As a hobby,"Online resources, Books","freeCodeCamp, Mozilla Developer Network (MDN),...",I haven't attended any in-person coding-relate...,"Darknet Diaries, Real Python Podcast","freeCodeCamp, Traversy Media",,36,0.0,...,,,,,,,I am not working,No,No,No
4,7/1/2021 11:14:31,To start your first career,"Online resources, Books, Online bootcamps","freeCodeCamp, Stack Overflow, Coursera, Udemy",I haven't attended any in-person coding-relate...,Talk Python to Me,"freeCodeCamp, The Net Ninja, Traversy Media",2.0,24,5000.0,...,Somewhat dissatisfied,Somewhat satisfied,Somewhat dissatisfied,Somewhat satisfied,Somewhat dissatisfied,Somewhat dissatisfied,45 to 60 minutes,No,No,Yes
5,7/1/2021 11:17:08,To succeed in current career,"Online resources, Books, Conferences","freeCodeCamp, Mozilla Developer Network (MDN),...",I haven't attended any in-person coding-relate...,Ladybug Podcast,"Coder Coder, Coding Train, freeCodeCamp, Googl...",10.0,50,200.0,...,Somewhat dissatisfied,Very satisfied,Somewhat dissatisfied,Somewhat dissatisfied,Somewhat dissatisfied,Somewhat satisfied,I work from home,No,No,Yes
6,7/1/2021 11:21:26,To start your first career,"Online resources, In-person bootcamps, Meetup....","freeCodeCamp, Mozilla Developer Network (MDN),...","freeCodeCamp study groups, Meetup.com events","Syntax.fm, Indie Hackers","Ben Awad, Coding Train, DesignCourse, Dev Ed, ...",5.0,36,10500.0,...,Very satisfied,Very satisfied,Very satisfied,Very satisfied,Very satisfied,Very satisfied,I work from home,No,No,Yes
7,7/1/2021 11:24:57,As a hobby,"Online resources, Hackathons, Workshops, Tinke...","freeCodeCamp, Mozilla Developer Network (MDN),...",I haven't attended any in-person coding-relate...,"The Changelog, Indie Hackers","Ben Awad, Coding Train, Dev Ed, freeCodeCamp, ...",20.0,30,0.0,...,Very satisfied,Somewhat satisfied,Very satisfied,Very satisfied,Very satisfied,Very satisfied,I work from home,No,No,Yes
8,7/1/2021 11:30:27,To change careers,"Online resources, Books, Online bootcamps","freeCodeCamp, Mozilla Developer Network (MDN),...",I haven't attended any in-person coding-relate...,The CSS Podcast,"Ben Awad, Code with Ania Kubów, CodeStacker, C...",20.0,12,400.0,...,Somewhat satisfied,Very satisfied,Somewhat dissatisfied,Very satisfied,Somewhat dissatisfied,Very satisfied,I work from home,No,No,Yes
9,7/1/2021 11:46:42,To start a business or to freelance,Online resources,Stack Overflow,I haven't attended any in-person coding-relate...,I haven't listened to any podcasts,"CS Dojo, freeCodeCamp, Programming With Mosh, ...",5.0,24,500.0,...,Very dissatisfied,Somewhat satisfied,I do not know,Somewhat dissatisfied,Very dissatisfied,Somewhat dissatisfied,I work from home,No,No,No


### **Quality Issue 1: FILL IN**

In [None]:
# FILL IN - Apply the cleaning strategy

In [None]:
# FILL IN - Validate the cleaning was successful

Justification: *FILL IN*

### **Tidiness Issue 1: FILL IN**

In [None]:
#FILL IN - Apply the cleaning strategy

In [None]:
#FILL IN - Validate the cleaning was successful

Justification: *FILL IN*

### **Remove unnecessary variables and combine datasets**

Depending on the datasets, you can also peform the combination before the cleaning steps.

In [None]:
#FILL IN - Remove unnecessary variables and combine datasets

## 4. Update your data store
Update your local database/data store with the cleaned data, following best practices for storing your cleaned data:

- Must maintain different instances / versions of data (raw and cleaned data)
- Must name the dataset files informatively
- Ensure both the raw and cleaned data is saved to your database/data store

In [None]:
#FILL IN - saving data

## 5. Answer the research question

### **5.1:** Define and answer the research question 
Going back to the problem statement in step 1, use the cleaned data to answer the question you raised. Produce **at least** two visualizations using the cleaned data and explain how they help you answer the question.

*Research question:* FILL IN from answer to Step 1

In [None]:
#Visual 1 - FILL IN

*Answer to research question:* FILL IN

In [None]:
#Visual 2 - FILL IN

*Answer to research question:* FILL IN

### **5.2:** Reflection
In 2-4 sentences, if you had more time to complete the project, what actions would you take? For example, which data quality and structural issues would you look into further, and what research questions would you further explore?

*Answer:* FILL IN