# News, Trust, and Data Access

## Data Cleaning

This jupyter notebook cleans the coded responses into a tidy dataset for analysis and visualization.

The cleaned data is saved in a new CSV file named `news_trust_data__clean.csv`

In [1]:
# Import needed packages
import pandas as pd

# Read in coded responses
survey = pd.read_csv('dataset/all_responses_coded.csv', index_col='index')

# View head of imported data
survey.head()

Unnamed: 0_level_0,RespondentID,A1,A2,A3,A4,A5,A6,A7,A8,A9,...,A55,A56,A57,A58,A59,A60,A61,A62,StartDate,EndDate
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,6176264298,0,0,1,0,0,0,1,0,0,...,0,0,1,0,0,0,0,0,5/1/17 15:41,5/1/17 15:43
1,6176263960,0,0,1,0,0,0,1,0,0,...,0,0,0,0,0,1,0,0,5/1/17 15:35,5/1/17 15:43
2,6176258621,0,0,0,1,0,0,1,0,0,...,0,0,0,0,0,1,0,0,5/1/17 15:38,5/1/17 15:40
3,6176257082,0,0,0,1,0,0,1,0,0,...,1,0,1,0,0,0,0,0,5/1/17 15:38,5/1/17 15:39
4,6176256111,0,0,0,1,0,0,1,0,0,...,0,1,1,0,0,0,0,0,5/1/17 15:34,5/1/17 15:39


In [2]:
# Drop the Start and End Date columns, as we will not need them in this analysis
survey.drop(columns=['StartDate', 'EndDate'], inplace=True)

### Political Leanings

In general, how would you describe your views on most political issues? Are you:
1. Very conservative
2. Conservative
3. Moderate
4. Liberal
5. Very Liberal

*Note: Answers are coded to A1, A2, A3, etc in the dataset*

In [3]:
# Translate first question, with responses A1-A5, into single column
survey.loc[survey['A1'] == 1, 'Political_View'] = 'Very Conservative'
survey.loc[survey['A2'] == 1, 'Political_View'] = 'Conservative'
survey.loc[survey['A3'] == 1, 'Political_View'] = 'Moderate'
survey.loc[survey['A4'] == 1, 'Political_View'] = 'Liberal'
survey.loc[survey['A5'] == 1, 'Political_View'] = 'Very Liberal'

# Drop coded responses for Q1
survey.drop(columns=['A1', 'A2', 'A3', 'A4', 'A5'], inplace=True)

survey.head()

Unnamed: 0_level_0,RespondentID,A6,A7,A8,A9,A10,A11,A12,A13,A14,...,A54,A55,A56,A57,A58,A59,A60,A61,A62,Political_View
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,6176264298,0,1,0,0,1,0,1,0,0,...,0,0,0,1,0,0,0,0,0,Moderate
1,6176263960,0,1,0,0,1,0,0,0,0,...,0,0,0,0,0,0,1,0,0,Moderate
2,6176258621,0,1,0,0,1,0,0,1,0,...,0,0,0,0,0,0,1,0,0,Liberal
3,6176257082,0,1,0,0,0,0,0,1,0,...,0,1,0,1,0,0,0,0,0,Liberal
4,6176256111,0,1,0,0,1,1,1,0,0,...,0,0,1,1,0,0,0,0,0,Liberal


### General Trust
In general, how much trust do you have in the press when it comes to reporting the news fully, accurately, and fairly?

6. Great Amount
7. Fair Amount
8. Not very much
9. None at all

In [4]:
# Translate first question, with responses A1-A5, into single column
survey.loc[survey['A6'] == 1, 'General_Trust'] = 'Great Amount'
survey.loc[survey['A7'] == 1, 'General_Trust'] = 'Fair Amount'
survey.loc[survey['A8'] == 1, 'General_Trust'] = 'Not very much'
survey.loc[survey['A9'] == 1, 'General_Trust'] = 'None at all'

# Drop coded responses for Q1
survey.drop(columns=['A6', 'A7', 'A8', 'A9'], inplace=True)

survey.head()

Unnamed: 0_level_0,RespondentID,A10,A11,A12,A13,A14,A15,A16,A17,A18,...,A55,A56,A57,A58,A59,A60,A61,A62,Political_View,General_Trust
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,6176264298,1,0,1,0,0,0,1,0,0,...,0,0,1,0,0,0,0,0,Moderate,Fair Amount
1,6176263960,1,0,0,0,0,0,1,0,0,...,0,0,0,0,0,1,0,0,Moderate,Fair Amount
2,6176258621,1,0,0,1,0,0,0,0,0,...,0,0,0,0,0,1,0,0,Liberal,Fair Amount
3,6176257082,0,0,0,1,0,0,0,0,0,...,1,0,1,0,0,0,0,0,Liberal,Fair Amount
4,6176256111,1,1,1,0,0,0,1,1,0,...,0,1,1,0,0,0,0,0,Liberal,Fair Amount


### Trust of specific news outlets

Which of these news outlets do you trust when it comes to reporting the news fully, accurately and fairly?


10. The New York Times
11. The Wall Street Journal
12. USA TODAY
13. The Washington Post
14. Fox News
15. Breitbart
16. CNN
17. BuzzFeed News
18. Huffington Post
19. Time
20. U.S. News & World Report 
21. Other


In [5]:
survey.rename(columns={'A10':'NYT', 'A11':'WSJ', 'A12':'USA_TODAY', 'A13':'WaPo', 'A14':'FoxNews', 'A15':'Breitbart', 'A16':'CNN', 'A17':'BuzzFeed_News', 'A18':'HuffPo', 'A19':'Time', 'A20':'USNWR', 'A21':'Other'}, inplace=True)

survey.head()

Unnamed: 0_level_0,RespondentID,NYT,WSJ,USA_TODAY,WaPo,FoxNews,Breitbart,CNN,BuzzFeed_News,HuffPo,...,A55,A56,A57,A58,A59,A60,A61,A62,Political_View,General_Trust
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,6176264298,1,0,1,0,0,0,1,0,0,...,0,0,1,0,0,0,0,0,Moderate,Fair Amount
1,6176263960,1,0,0,0,0,0,1,0,0,...,0,0,0,0,0,1,0,0,Moderate,Fair Amount
2,6176258621,1,0,0,1,0,0,0,0,0,...,0,0,0,0,0,1,0,0,Liberal,Fair Amount
3,6176257082,0,0,0,1,0,0,0,0,0,...,1,0,1,0,0,0,0,0,Liberal,Fair Amount
4,6176256111,1,1,1,0,0,0,1,1,0,...,0,1,1,0,0,0,0,0,Liberal,Fair Amount


### Pay for News

Do you currently pay for access to any online news source (including online versions of print magazines, newspapers, and other publications)?

22. Yes
23. No

In [6]:
survey.loc[survey['A22'] == 1, 'Pay_For_News'] = 'Yes'
survey.loc[survey['A23'] == 1, 'Pay_For_News'] = 'No'

survey.drop(columns=['A22', 'A23'], inplace=True)

survey.head()

Unnamed: 0_level_0,RespondentID,NYT,WSJ,USA_TODAY,WaPo,FoxNews,Breitbart,CNN,BuzzFeed_News,HuffPo,...,A56,A57,A58,A59,A60,A61,A62,Political_View,General_Trust,Pay_For_News
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,6176264298,1,0,1,0,0,0,1,0,0,...,0,1,0,0,0,0,0,Moderate,Fair Amount,No
1,6176263960,1,0,0,0,0,0,1,0,0,...,0,0,0,0,1,0,0,Moderate,Fair Amount,No
2,6176258621,1,0,0,1,0,0,0,0,0,...,0,0,0,0,1,0,0,Liberal,Fair Amount,Yes
3,6176257082,0,0,0,1,0,0,0,0,0,...,0,1,0,0,0,0,0,Liberal,Fair Amount,No
4,6176256111,1,1,1,0,0,0,1,1,0,...,1,1,0,0,0,0,0,Liberal,Fair Amount,Yes


### Data Access

How would your impression of an online news article change if you could easily access the data behind the claims in the article?

24. Decrease Trust
25. Increase Trust
26. No change

In [7]:
survey.loc[survey['A24'] == 1, 'Data_Access'] = 'Decrease Trust'
survey.loc[survey['A25'] == 1, 'Data_Access'] = 'Increase Trust'
survey.loc[survey['A26'] == 1, 'Data_Access'] = 'No change'

survey.drop(columns=['A24', 'A25', 'A26'], inplace=True)

survey.head()

Unnamed: 0_level_0,RespondentID,NYT,WSJ,USA_TODAY,WaPo,FoxNews,Breitbart,CNN,BuzzFeed_News,HuffPo,...,A57,A58,A59,A60,A61,A62,Political_View,General_Trust,Pay_For_News,Data_Access
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,6176264298,1,0,1,0,0,0,1,0,0,...,1,0,0,0,0,0,Moderate,Fair Amount,No,No change
1,6176263960,1,0,0,0,0,0,1,0,0,...,0,0,0,1,0,0,Moderate,Fair Amount,No,Increase Trust
2,6176258621,1,0,0,1,0,0,0,0,0,...,0,0,0,1,0,0,Liberal,Fair Amount,Yes,Increase Trust
3,6176257082,0,0,0,1,0,0,0,0,0,...,1,0,0,0,0,0,Liberal,Fair Amount,No,Increase Trust
4,6176256111,1,1,1,0,0,0,1,1,0,...,1,0,0,0,0,0,Liberal,Fair Amount,Yes,No change


### Trump Approval

Do you approve or disapprove of the way Donald Trump is handling his job as president?

27. Strongly approve
28. Somewhat approve
29. Somewhat disapprove
30. Strongly disapprove

In [8]:
# Trump Approval
survey.loc[survey['A27'] == 1, 'Trump_Approval'] = 'Strongly approve'
survey.loc[survey['A28'] == 1, 'Trump_Approval'] = 'Somewhat approve'
survey.loc[survey['A29'] == 1, 'Trump_Approval'] = 'Somewhat disapprove'
survey.loc[survey['A30'] == 1, 'Trump_Approval'] = 'Strongly disapprove'

survey.drop(columns=['A27', 'A28', 'A29', 'A30'], inplace=True)

survey.head()

Unnamed: 0_level_0,RespondentID,NYT,WSJ,USA_TODAY,WaPo,FoxNews,Breitbart,CNN,BuzzFeed_News,HuffPo,...,A58,A59,A60,A61,A62,Political_View,General_Trust,Pay_For_News,Data_Access,Trump_Approval
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,6176264298,1,0,1,0,0,0,1,0,0,...,0,0,0,0,0,Moderate,Fair Amount,No,No change,Strongly disapprove
1,6176263960,1,0,0,0,0,0,1,0,0,...,0,0,1,0,0,Moderate,Fair Amount,No,Increase Trust,Somewhat disapprove
2,6176258621,1,0,0,1,0,0,0,0,0,...,0,0,1,0,0,Liberal,Fair Amount,Yes,Increase Trust,Strongly disapprove
3,6176257082,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,Liberal,Fair Amount,No,Increase Trust,Strongly disapprove
4,6176256111,1,1,1,0,0,0,1,1,0,...,0,0,0,0,0,Liberal,Fair Amount,Yes,No change,Strongly disapprove


### Age

31. 18-29
32. 30-44
33. 45-59
34. 60+

In [9]:
# Age
survey.loc[survey['A31'] == 1, 'Age'] = '18-29'
survey.loc[survey['A32'] == 1, 'Age'] = '30-44'
survey.loc[survey['A33'] == 1, 'Age'] = '45-59'
survey.loc[survey['A34'] == 1, 'Age'] = '60+'

survey.drop(columns = ['A31', 'A32', 'A33', 'A34'], inplace=True)

survey.head()

Unnamed: 0_level_0,RespondentID,NYT,WSJ,USA_TODAY,WaPo,FoxNews,Breitbart,CNN,BuzzFeed_News,HuffPo,...,A59,A60,A61,A62,Political_View,General_Trust,Pay_For_News,Data_Access,Trump_Approval,Age
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,6176264298,1,0,1,0,0,0,1,0,0,...,0,0,0,0,Moderate,Fair Amount,No,No change,Strongly disapprove,30-44
1,6176263960,1,0,0,0,0,0,1,0,0,...,0,1,0,0,Moderate,Fair Amount,No,Increase Trust,Somewhat disapprove,18-29
2,6176258621,1,0,0,1,0,0,0,0,0,...,0,1,0,0,Liberal,Fair Amount,Yes,Increase Trust,Strongly disapprove,30-44
3,6176257082,0,0,0,1,0,0,0,0,0,...,0,0,0,0,Liberal,Fair Amount,No,Increase Trust,Strongly disapprove,18-29
4,6176256111,1,1,1,0,0,0,1,1,0,...,0,0,0,0,Liberal,Fair Amount,Yes,No change,Strongly disapprove,30-44


### Gender

35. Female
36. Male

In [10]:
# Gender
survey.loc[survey['A35'] == 1, 'Gender'] = 'Female'
survey.loc[survey['A36'] == 1, 'Gender'] = 'Male'

survey.drop(columns=['A35', 'A36'], inplace=True)

survey.head()

Unnamed: 0_level_0,RespondentID,NYT,WSJ,USA_TODAY,WaPo,FoxNews,Breitbart,CNN,BuzzFeed_News,HuffPo,...,A60,A61,A62,Political_View,General_Trust,Pay_For_News,Data_Access,Trump_Approval,Age,Gender
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,6176264298,1,0,1,0,0,0,1,0,0,...,0,0,0,Moderate,Fair Amount,No,No change,Strongly disapprove,30-44,Male
1,6176263960,1,0,0,0,0,0,1,0,0,...,1,0,0,Moderate,Fair Amount,No,Increase Trust,Somewhat disapprove,18-29,Female
2,6176258621,1,0,0,1,0,0,0,0,0,...,1,0,0,Liberal,Fair Amount,Yes,Increase Trust,Strongly disapprove,30-44,Male
3,6176257082,0,0,0,1,0,0,0,0,0,...,0,0,0,Liberal,Fair Amount,No,Increase Trust,Strongly disapprove,18-29,Male
4,6176256111,1,1,1,0,0,0,1,1,0,...,0,0,0,Liberal,Fair Amount,Yes,No change,Strongly disapprove,30-44,Male


### Income

37. 0 to 9,999
38. 10,000 to 24,999
39. 25,000 to 49,999 
40. 50,000 to 74,999 
41. 75,000 to 99,999 
42. 100,000 to 124,999 
43. 125,000 to 149,999 
44. 150,000 to 174,999 
45. 175,000 to 199,999 
46. 200,000 and up 
47. Prefer not to answer


In [11]:
# Income
survey.loc[survey['A37'] == 1, 'Income'] = '0-9,999'
survey.loc[survey['A38'] == 1, 'Income'] = '10,000-24,999'
survey.loc[survey['A39'] == 1, 'Income'] = '25,000-49,999'
survey.loc[survey['A40'] == 1, 'Income'] = '50,000-74,999'
survey.loc[survey['A41'] == 1, 'Income'] = '75,000-99,999'
survey.loc[survey['A42'] == 1, 'Income'] = '100,000-124,999'
survey.loc[survey['A43'] == 1, 'Income'] = '125,000-149,999'
survey.loc[survey['A44'] == 1, 'Income'] = '150,000-174,999'
survey.loc[survey['A45'] == 1, 'Income'] = '175,000-199,999'
survey.loc[survey['A46'] == 1, 'Income'] = '200,000+'
survey.loc[survey['A47'] == 1, 'Income'] = 'Prefer not to answer'

survey.drop(columns=['A37', 'A38', 'A39', 'A40', 'A41', 'A42', 'A43', 'A44', 'A45', 'A46', 'A47'], inplace=True)

survey.head()

Unnamed: 0_level_0,RespondentID,NYT,WSJ,USA_TODAY,WaPo,FoxNews,Breitbart,CNN,BuzzFeed_News,HuffPo,...,A61,A62,Political_View,General_Trust,Pay_For_News,Data_Access,Trump_Approval,Age,Gender,Income
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,6176264298,1,0,1,0,0,0,1,0,0,...,0,0,Moderate,Fair Amount,No,No change,Strongly disapprove,30-44,Male,"10,000-24,999"
1,6176263960,1,0,0,0,0,0,1,0,0,...,0,0,Moderate,Fair Amount,No,Increase Trust,Somewhat disapprove,18-29,Female,"0-9,999"
2,6176258621,1,0,0,1,0,0,0,0,0,...,0,0,Liberal,Fair Amount,Yes,Increase Trust,Strongly disapprove,30-44,Male,"125,000-149,999"
3,6176257082,0,0,0,1,0,0,0,0,0,...,0,0,Liberal,Fair Amount,No,Increase Trust,Strongly disapprove,18-29,Male,"125,000-149,999"
4,6176256111,1,1,1,0,0,0,1,1,0,...,0,0,Liberal,Fair Amount,Yes,No change,Strongly disapprove,30-44,Male,"10,000-24,999"


### US Region

48. New England
49. Middle Atlantic
50. East North Central
51. West North Central 
52. South Atlantic
53. East South Central 
54. West South Central 
55. Mountain
56. Pacific

In [12]:
# Region
survey.loc[survey['A48'] == 1, 'Region'] = 'New England'
survey.loc[survey['A49'] == 1, 'Region'] = 'Middle Atlantic'
survey.loc[survey['A50'] == 1, 'Region'] = 'East North Central'
survey.loc[survey['A51'] == 1, 'Region'] = 'West North Central'
survey.loc[survey['A52'] == 1, 'Region'] = 'South Atlantic'
survey.loc[survey['A53'] == 1, 'Region'] = 'East South Central'
survey.loc[survey['A54'] == 1, 'Region'] = 'West South Central'
survey.loc[survey['A55'] == 1, 'Region'] = 'Mountain'
survey.loc[survey['A56'] == 1, 'Region'] = 'Pacific'

survey.drop(columns=['A48', 'A49', 'A50', 'A51', 'A52', 'A53', 'A54', 'A55', 'A56'], inplace=True)

survey.head()

Unnamed: 0_level_0,RespondentID,NYT,WSJ,USA_TODAY,WaPo,FoxNews,Breitbart,CNN,BuzzFeed_News,HuffPo,...,A62,Political_View,General_Trust,Pay_For_News,Data_Access,Trump_Approval,Age,Gender,Income,Region
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,6176264298,1,0,1,0,0,0,1,0,0,...,0,Moderate,Fair Amount,No,No change,Strongly disapprove,30-44,Male,"10,000-24,999",East South Central
1,6176263960,1,0,0,0,0,0,1,0,0,...,0,Moderate,Fair Amount,No,Increase Trust,Somewhat disapprove,18-29,Female,"0-9,999",Middle Atlantic
2,6176258621,1,0,0,1,0,0,0,0,0,...,0,Liberal,Fair Amount,Yes,Increase Trust,Strongly disapprove,30-44,Male,"125,000-149,999",East North Central
3,6176257082,0,0,0,1,0,0,0,0,0,...,0,Liberal,Fair Amount,No,Increase Trust,Strongly disapprove,18-29,Male,"125,000-149,999",Mountain
4,6176256111,1,1,1,0,0,0,1,1,0,...,0,Liberal,Fair Amount,Yes,No change,Strongly disapprove,30-44,Male,"10,000-24,999",Pacific


### Device

57. iOS Phone / Tablet 
58. Android Phone / Tablet 
59. Other Phone / Tablet 
60. Windows Desktop / Laptop 
61. MacOS Desktop / Laptop 
62. Other Phone / Tablet

In [13]:
# Device
survey.loc[survey['A57'] == 1, 'Device'] = 'iOS Phone/Tablet'
survey.loc[survey['A58'] == 1, 'Device'] = 'Android Phone/Tablet'
survey.loc[survey['A59'] == 1, 'Device'] = 'Other Phone/Tablet'
survey.loc[survey['A60'] == 1, 'Device'] = 'Windows Desktop/Laptop'
survey.loc[survey['A61'] == 1, 'Device'] = 'MacOS Desktop/Laptop'
survey.loc[survey['A62'] == 1, 'Device'] = 'Other Desktop/Laptop'

survey.drop(columns=['A57', 'A58', 'A59', 'A60', 'A61', 'A62'], inplace=True)

survey.head()

Unnamed: 0_level_0,RespondentID,NYT,WSJ,USA_TODAY,WaPo,FoxNews,Breitbart,CNN,BuzzFeed_News,HuffPo,...,Political_View,General_Trust,Pay_For_News,Data_Access,Trump_Approval,Age,Gender,Income,Region,Device
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,6176264298,1,0,1,0,0,0,1,0,0,...,Moderate,Fair Amount,No,No change,Strongly disapprove,30-44,Male,"10,000-24,999",East South Central,iOS Phone/Tablet
1,6176263960,1,0,0,0,0,0,1,0,0,...,Moderate,Fair Amount,No,Increase Trust,Somewhat disapprove,18-29,Female,"0-9,999",Middle Atlantic,Windows Desktop/Laptop
2,6176258621,1,0,0,1,0,0,0,0,0,...,Liberal,Fair Amount,Yes,Increase Trust,Strongly disapprove,30-44,Male,"125,000-149,999",East North Central,Windows Desktop/Laptop
3,6176257082,0,0,0,1,0,0,0,0,0,...,Liberal,Fair Amount,No,Increase Trust,Strongly disapprove,18-29,Male,"125,000-149,999",Mountain,iOS Phone/Tablet
4,6176256111,1,1,1,0,0,0,1,1,0,...,Liberal,Fair Amount,Yes,No change,Strongly disapprove,30-44,Male,"10,000-24,999",Pacific,iOS Phone/Tablet


### Check for null values and remove unwanted columns

In [14]:
# Drop unwanted columns
survey.drop(columns=['Other', 'Region', 'Device'], inplace=True)

In [15]:
survey.isnull().sum()

RespondentID      0
NYT               0
WSJ               0
USA_TODAY         0
WaPo              0
FoxNews           0
Breitbart         0
CNN               0
BuzzFeed_News     0
HuffPo            0
Time              0
USNWR             0
Political_View    0
General_Trust     0
Pay_For_News      0
Data_Access       0
Trump_Approval    0
Age               2
Gender            2
Income            2
dtype: int64

In [16]:
# Drop null values
survey.dropna(inplace=True)

### Melt into Tidy dataset

In [17]:
tidy_survey = survey.melt(id_vars=['RespondentID', 'Political_View', 'General_Trust', 'Pay_For_News', 'Data_Access', 'Trump_Approval', 'Age', 'Gender', 'Income'], var_name='NewsSource', value_name='SourceTrust')
tidy_survey.head()

Unnamed: 0,RespondentID,Political_View,General_Trust,Pay_For_News,Data_Access,Trump_Approval,Age,Gender,Income,NewsSource,SourceTrust
0,6176264298,Moderate,Fair Amount,No,No change,Strongly disapprove,30-44,Male,"10,000-24,999",NYT,1
1,6176263960,Moderate,Fair Amount,No,Increase Trust,Somewhat disapprove,18-29,Female,"0-9,999",NYT,1
2,6176258621,Liberal,Fair Amount,Yes,Increase Trust,Strongly disapprove,30-44,Male,"125,000-149,999",NYT,1
3,6176257082,Liberal,Fair Amount,No,Increase Trust,Strongly disapprove,18-29,Male,"125,000-149,999",NYT,0
4,6176256111,Liberal,Fair Amount,Yes,No change,Strongly disapprove,30-44,Male,"10,000-24,999",NYT,1


## Save cleaned data to new CSV

In [18]:
# Save the tided dataset to '__clean'
tidy_survey.to_csv('dataset/news_trust_data__clean.csv', index=None)