# Snapchat Political Ads
This project uses political ads data from Snapchat, a popular social media app. Interesting questions to consider include:
- What are the most prevalent organizations, advertisers, and ballot candidates in the data? Do you recognize any?
- What are the characteristics of ads with a large reach, i.e., many views? What may a campaign consider when maximizing an ad's reach?
- What are the characteristics of ads with a smaller reach, i.e., less views? Aside from funding constraints, why might a campaign want to produce an ad with a smaller but more targeted reach?
- What are the characteristics of the most expensive ads? If a campaign is limited on advertising funds, what type of ad may the campaign consider?
- What groups or regions are targeted frequently? (For example, for single-gender campaigns, are men or women targeted more frequently?) What groups or regions are targeted less frequently? Why? Does this depend on the type of campaign?
- Have the characteristics of ads changed over time (e.g. over the past year)?
- When is the most common local time of day for an ad's start date? What about the most common day of week? (Make sure to account for time zones for both questions.)

### Getting the Data
The data and its corresponding data dictionary is downloadable [here](https://www.snap.com/en-US/political-ads/). Download both the 2018 CSV and the 2019 CSV. 

The CSVs have the same filename; rename the CSVs as needed.

Note that the CSVs have the exact same columns and the exact same data dictionaries (`readme.txt`).

### Cleaning and EDA
- Concatenate the 2018 CSV and the 2019 CSV into one DataFrame so that we have data from both years.
- Clean the data.
    - Convert `StartDate` and `EndDate` into datetime. Make sure the datetimes are in the correct time zone. You can use whatever timezone (e.g. UTC) you want as long as you are consistent. However, if you want to answer a question like "When is the most common local time of day for an ad's start date," you will need to convert timezones as needed. See Hint 2 below for more information.
- Understand the data in ways relevant to your question using univariate and bivariate analysis of the data as well as aggregations.

*Hint 1: What is the "Z" at the end of each timestamp?*

*Hint 2: `pd.to_datetime` will be useful here. `Series.dt.tz_convert` will be useful if a change in time zone is needed.*

*Tip: To visualize geospatial data, consider [Folium](https://python-visualization.github.io/folium/) or another geospatial plotting library.*

### Assessment of Missingness
Many columns which have `NaN` values may not actually have missing data. How come? In some cases, a null or empty value corresponds to an actual, meaningful value. For example, `readme.txt` states the following about `Gender`:

>  Gender - Gender targeting criteria used in the Ad. If empty, then it is targeting all genders

In this scenario, an empty `Gender` value (which is read in as `NaN` in pandas) corresponds to "all genders".

- Refer to the data dictionary to determine which columns do **not** belong to the scenario above. Assess the missingness of one of these columns.

### Hypothesis Test / Permutation Test
Find a hypothesis test or permutation test to perform. You can use the questions at the top of the notebook for inspiration.

# Summary of Findings

### Introduction
TODO

### Cleaning and EDA
TODO

### Assessment of Missingness
TODO

### Hypothesis Test
TODO

# Code

In [7]:
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import seaborn as sns
%matplotlib inline
%config InlineBackend.figure_format = 'retina'  # Higher resolution figures
pd.set_option("display.max_columns", None)

In [31]:
table  = pd.read_csv('PoliticalAds.csv')
table

Unnamed: 0,ADID,CreativeUrl,Currency Code,Spend,Impressions,StartDate,EndDate,OrganizationName,BillingAddress,CandidateBallotInformation,PayingAdvertiserName,Gender,AgeBracket,CountryCode,Regions (Included),Regions (Excluded),Electoral Districts (Included),Electoral Districts (Excluded),Radius Targeting (Included),Radius Targeting (Excluded),Metros (Included),Metros (Excluded),Postal Codes (Included),Postal Codes (Excluded),Location Categories (Included),Location Categories (Excluded),Interests,OsType,Segments,Language,AdvancedDemographics,Targeting Connection Type,Targeting Carrier (ISP),CreativeProperties
0,9ef31071d90129e35a582d07fb50b21f40e98235e80d00...,https://www.snap.com/political-ads/asset/78f06...,USD,2360,311317,2018/10/17 15:00:00Z,2018/11/07 04:00:00Z,Democratic Congressional Campaign Committee,"430 S Capitol St SE,Washington,20003,US",,DCCC,,18+,united states,,,,,,,,,,,,,,,,,,,,web_view_url:https://mypollingplace.org/
1,8fcd02787826bf69376f3de5d636ef83ac1ed7760cda83...,https://www.snap.com/political-ads/asset/782e1...,USD,1045,309853,2018/10/24 18:56:43Z,2018/11/07 00:00:59Z,Centro LLC,"11 E. Madison Ave. 6th Floor,,,Chicago,60602,US",,Save Animals Facing Extinction,,18-34,united states,Nevada,,,,,,,,,,,,,,Provided by Advertiser,,,,,web_view_url:http://protectesa.org/?utm_source...
2,ef6be28a3be48408c6f08b22bb405e518c57717198abc6...,https://www.snap.com/political-ads/asset/6bfcd...,USD,107,19452,2018/10/28 17:58:01Z,2018/11/06 22:59:59Z,Mothership Strategies,"1328 Florida Avenue NW, Building C, Washington...",,Progressive Turnout Project,,18+,united states,,,,,,,,,,,,,,,,,,,,web_view_url:http://votingmatters.org/
3,e3ec1ec0fbecc53e99c1dc2ddf5ce7a471b97a6f522152...,https://www.snap.com/political-ads/asset/a4316...,USD,66,5650,2018/10/19 21:12:44Z,2018/11/06 04:59:59Z,Blueprint Interactive,"1730 Rhode Island Ave NW Suite 1014,Washington...",,AFSCME Nevada,,18+,united states,Nevada,,,,,,,,,,,,,,,es,,,,
4,a45c9bacdd2ef9fd0b288fc9fb1065734bee705e595cac...,https://www.snap.com/political-ads/asset/5cae0...,USD,27,8841,2018/11/02 22:47:04Z,2018/11/07 01:00:00Z,Mothership Strategies,"1328 Florida Avenue NW, Building C, Washington...",,Voto Latino,,18-25,united states,,,Texas 23rd District,,,,,,,,,,,,,,,,,web_view_url:https://vota2018.org/
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
654,ab3782fff62bb34ea9d35fea49f4682636799ada345a21...,https://www.snap.com/political-ads/asset/d4043...,EUR,2814,1163315,2018/11/27 05:00:44Z,2019/01/13 11:31:15Z,Maxlead Services B.V.,"Wilhelminapark 17,Oegstgeest,2342 AD,NL",,Consumentenbond,,18-34,netherlands,,,,,,,,,,,,,,ANDROID,Provided by Advertiser,nl,,,,web_view_url:https://www.consumentenbond.nl/ac...
655,8caf171d51704d2c146c4e66a216867255760d393e02c1...,https://www.snap.com/political-ads/asset/2bb01...,USD,102,98863,2018/09/28 00:17:06Z,2018/09/30 10:15:51Z,Gorran Election Campaign,US,,Paid for by Balen Isamel,,18+,iraq,"Al Sulaymaniyah,Arbil,Dahouk",,,,,,,,,,,,,,Provided by Advertiser,,,,,
656,1b60806dce69445f7cf992c4d823f59b0b00e0faaf5c02...,https://www.snap.com/political-ads/asset/429bb...,USD,978,92681,2018/10/18 20:13:02Z,2018/11/06 23:00:00Z,Bully Pulpit Interactive,"1445 New York Ave NW,Washington,20005,US",,NextGen America,,18-34,united states,,,,,,,,,"18109,19437,18320,19539,19073,18067,18036,1947...",,,,,,,,,,,
657,70d53e039fa1cb2071e62e2eb2da223919538df4b79b38...,https://www.snap.com/political-ads/asset/9059c...,USD,781,67692,2018/10/18 17:11:20Z,2018/11/06 23:00:00Z,Bully Pulpit Interactive,"1445 New York Ave NW,Washington,20005,US",,NextGen America,,18-34,united states,,,,,,,,,"92691,92679,92782,92780,92808,92620,92610,9286...",,,,,,,,,,,


### Cleaning and EDA

In [30]:
lis = []
for i in range(34):
    
    if (table[table.columns[i]].isnull().value_counts().iloc[0]!=659):
        lis.append(table.columns[i])
lis

['EndDate',
 'Gender',
 'AgeBracket',
 'Regions (Included)',
 'Regions (Excluded)',
 'Electoral Districts (Included)',
 'Radius Targeting (Included)',
 'Radius Targeting (Excluded)',
 'Metros (Included)',
 'Postal Codes (Included)',
 'Location Categories (Included)',
 'Location Categories (Excluded)',
 'Interests',
 'OsType',
 'Segments',
 'Language',
 'AdvancedDemographics',
 'CreativeProperties']

### Assessment of Missingness

In [None]:
# TODO

### Hypothesis Test

In [None]:
# TODO