# Political Opinion Data from the Pew Research Center
## Dataset Description and Initial Wrangling/Cleaning

### Overview

This data is from the Pew Research Center. The Pew Research Center is a respected research instituion which does survey research to measure the public opinion on various political topics. The following data was taken manually from the Pew Research Center's 2017 Political Typology report. 

### Dataset Questions

The questions and statements I took from this data were the following:

1) Percent which responded strongly favor or favor to the following question:  "Do you strongly favor, favor, oppose, or strongly oppose allowing gays and lesbians to marry legally?"

2) Percent which agreed the following statement regarding immigrants came closest to their views: "Homosexuality should be accepted by society."

3) Percent which responded as satisfied to the following question: All in all, are you satisfied or dissatisfied with the way things are going in this country today?

4) Percent which agreed the following statement regarding immigrants came closest to their views: "Immigrants today strengthen our country because of their hard work and talents"

5) Percent which agreed the following statement came closest to their views: "The Islamic religion is more likely than others to encourage violence among its believers"

6) Percent which agreed the following statement came closest to their views: "Racial discrimination is the main reason why many black people can't get ahead these days."


### Source

https://www.pewresearch.org/politics/dataset/political-typology-2017/

### Limitations

The following are limitations for this dataset:

1. Being data from a survey, these numbers present only opinions of respondents to the survey which could create bias. 
2. Some years contained multiple results for each question. For years with multiple results, I took the average of the value I was measuring. 
3. As will be shown below, some years were missing data. For these years, data will be forward filled with the data from the nearest neighbor. 

### Feature Descriptions

The columns include the year and one column for each of the questions and/or statements listed above under "Dataset Questions." The value is recorded in each column for years 2000-2020 and refelcts a percentage of respondents. 

### Initial Wrangling / Cleaning

In [32]:
# imports pandas for data manipulation

import pandas as pd

In [33]:
# loads dataframe from pew research csv

df = pd.read_csv(r"C:\Users\14802\Desktop\hate-crime analysis\datasets\pew_research_data.csv")

In [34]:
# initial look at dataframe

df.head(25)

Unnamed: 0,year,percent_favor_gay_marriage,percent_accepting_homosexuality,percent_satisfied_with_current_us,percent_favor_immigrants,percent_agree_islam_encourages_violence,percent_agree_discrimination_hurts_black_people,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11
0,2020,,,,,,,,,,,
1,2019,,,,,,,,,,,
2,2018,,,,,,,,,,,
3,2017,62,70,29,65,43,41,,,,,
4,2016,55,63,28.6,61,41,32,,,,,
5,2015,55.5,61.5,29.3,52,46,30,,,,,
6,2014,51.5,62,27,57,43.7,27,,,,,
7,2013,49,58.5,25.5,49,42,,,,,,
8,2012,48,56,27.9,48,,21,,,,,
9,2011,45.5,58,23,45,40,26,,,,,


In [35]:
# selects only desired rows

df = df.iloc[0:21].copy()

In [36]:
# reverses row order to be ascending by year

df = df.loc[::-1]

In [37]:
# resets and drops index column

df.reset_index(inplace=True)
df.drop(columns=['index'], inplace=True)

In [38]:
# selects only desired columns

df = df[['year',
 'percent_favor_gay_marriage',
 'percent_accepting_homosexuality',
 'percent_satisfied_with_current_us',
 'percent_favor_immigrants',
 'percent_agree_islam_encourages_violence',
 'percent_agree_discrimination_hurts_black_people']].copy()

In [40]:
df.ffill(inplace=True) # forward fills NaN values in each column

In [41]:
df

Unnamed: 0,year,percent_favor_gay_marriage,percent_accepting_homosexuality,percent_satisfied_with_current_us,percent_favor_immigrants,percent_agree_islam_encourages_violence,percent_agree_discrimination_hurts_black_people
0,2000,35.0,50.0,50.0,50.0,25.0,31
1,2001,35.0,50.0,48.2,50.0,25.0,31
2,2002,35.0,50.0,44.6,50.0,25.0,31
3,2003,32.6,47.0,43.2,46.0,44.0,24
4,2004,31.0,49.0,38.3,45.0,46.0,27
5,2005,36.0,49.0,35.8,45.0,36.0,26
6,2006,35.6,51.0,30.1,41.0,36.0,26
7,2007,36.3,51.0,29.4,41.0,45.0,26
8,2008,39.0,51.0,19.1,41.0,45.0,26
9,2009,37.0,51.0,27.1,46.0,38.0,18


In [42]:
# makes csv of clean dataframe for use in other notebooks

# df.to_csv('clean_pew_research.csv', index=False)