# Data Management - Exisitng Values

**Learning Objective:** 
- Learn to recode variables
- Learn to create new variables
- Learn to use dictionaries


### Filtering Observation (Rows)

Data is messy. Most of the time you need to filter some observations (rows) from your dataset. 
- You are interested in some particular aspects of your dataset (eg. young voters). 
- The information is irrelevant and you need to remove data to avoid drawing wrong conclusions (eg. people who refuse to answer).

Therefore, you need a way to filter the observations in your dataset. 

- Relational operators provide a way to subset observations
- There are also useful methods that can help you with this.

![](https://pandas.pydata.org/docs/_images/05_newcolumn_1.svg)

![](https://pandas.pydata.org/docs/_images/05_newcolumn_2.svg)

We have seen that the data in the ANES contains values such as -8, -9, 99, that might or might not be useful depending on the probelm we want to tackle.

How can we deal with them?

Let's start with this question to illustrate:

- Are Young Voters more Liberal or Conservative?


In [15]:
# Load Pandas
import pandas as pd

# Import Data
data_url = "https://raw.githubusercontent.com/datamisc/ts-2020/main/data.csv"
anes_data  = pd.read_csv(data_url, compression='gzip')


  interactivity=interactivity, compiler=compiler, result=result)


In [49]:
# Subsetting & Renaming Variables
my_vars = [
    "V202073",  # vote choice
    "V201507x",  # age
    "V201200",  # liberal-conservative self-placement
    "V200010b",  # post-election weights
]

df = anes_data[my_vars]

df.columns = ["vote_choice", "age", "ideology", "weight"]

df.head()


Unnamed: 0,vote_choice,age,ideology,weight
0,-1,46,6,1.00573747444017
1,3,37,4,1.16347312190926
2,1,40,2,0.768681110109909
3,1,41,3,0.521019534357875
4,2,72,5,0.965789166832749


In [50]:
# Using Filters
mask = df['vote_choice'].between(1,2)
filter_df = df[mask].copy()
filter_df['vote_choice'].value_counts(normalize=True)


1    0.570257
2    0.429743
Name: vote_choice, dtype: float64

In [71]:
mask = df["vote_choice"] > 0
recoded_df = df[mask].copy()

mask = recoded_df["vote_choice"] > 2

recoded_df.loc[mask, "vote_choice"] = "Other"
recoded_df.describe()

recoded_df.loc[recoded_df["vote_choice"]==1, "vote_choice"] = "J.Biden"
recoded_df.loc[recoded_df["vote_choice"]==2, "vote_choice"] = "D.Trump"

recoded_df["vote_choice"].value_counts(normalize=True)


J.Biden    0.553823
D.Trump    0.417359
Other      0.028818
Name: vote_choice, dtype: float64