**PySDS Week 02 Day 03 v.1 - Exercise - Dates and more DataFrames**

Today we will continue to use the PySDS_PolCandidates.csv table and answer some more involved questions as DataFrame practice. 

This is not directly related to much of the material from today. As a consequence, I would like you to begin with a few practice exercises on parsing date times first. Then, using only filters, grouping and other features of DataFrames you should be able to accomplish the questions below. 

In [171]:
# Date Parsing exercises: 
from datetime import datetime
from datetime import timezone

time_now = datetime.now(timezone.utc) # specify the timezone
time_1 =  "June 20, 1985 12:35pm"
time_2 =  "10/10/10 10:10:10 +1000" # hint, the +1000 means UTC +10 hours 
time_3 = "534567890" #UTC time; hint: datetime.utcfromtimestamp(xx)

# Question 1. Using now(), which I realise will be a slightly 
# different time for everyone. report the time elapsed between 
# times 1,2,3

# Question 2. For each of the times above, what day of the week was it? 


# Answer
time_1 = datetime.strptime(time_1, '%B %d, %Y %H:%M%p').astimezone() # forces timezone aware
time_2 = datetime.strptime(time_2, '%d/%m/%y %H:%M:%S %z')
time_3 = datetime.utcfromtimestamp(int(time_3)).astimezone()


print(time_now-time_1)
print(time_now-time_2)
print(time_now-time_3)

weekdaymap = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday' ,'Sunday'] # 0-7 mapped to day names

print('time1', weekdaymap[time_1.weekday()]) # returns weekday, uses as index in weekdaymap
print('time2', weekdaymap[time_2.weekday()])
print('time3', weekdaymap[time_3.weekday()])

# Reviewer comments 




12173 days, 9:19:39.011223
2930 days, 20:44:29.011223
11635 days, 17:49:49.011223
time1 Thursday
time2 Sunday
time3 Wednesday


In [173]:
# Extended exercise part 1. 

# Using the data "PySDS_PolCandidates.csv" fill in the DataFrame below 
# with data. Also, try to ensure that it is formatted nicely. 
import pandas as pd 

media_combo_df = pd.DataFrame(columns=["Labour Party","Conservative Party","Total"],
                             index=["None",
                                   "Only Twitter",
                                   "Only Facebook",
                                   "Only Webpage",
                                   "Twitter and Facebook",
                                   "Facebook and Webpage",
                                   "Twitter and Webpage",
                                   "Twitter, Facebook and Webpage"
                                   ])

# Each cell should be the count of users of the total. So, if it is the 
# [None, Labour] cell it would be the number of Labour candidates
# who did not have either Twitter, Web or Facebook. 

# Here are some hints: If you ensure that the empty columns in the 
# PolCandidates.csv file are null, you can then use boolean logic to 
# select your variables. For example, 
# x = df['have_twitter'].notnull()
# y = df['have_facebook'].notnull() 
# then 
# have_both = df[x & y] 
# will get you the rows of the people who have both and 
# have_both['party'].value_counts() 
# will get you the count, by party, of the people 
# who have both twitter and facebook. 

filepath = ''

df = pd.read_csv(filepath+'PySDS_PolCandidates.csv')

# set up expressions for users with each service
x = df['twitter_username'].notnull()
y = df['facebook_page_url'].notnull()
z = df['party_ppc_page_url'].notnull()
lc = df['party'].isin(['Labour Party', 'Conservative Party']) # only those in lab/con parties

# go through different boolean configurations of having the services
bools = {'None':~x&~y&~z, 'Only Twitter':x&~y&~z, 'Only Facebook':~x&y&~z, 'Only Webpage':~x&~y&z,'Twitter and Facebook':x&y&~z,
         'Facebook and Webpage':~x&y&z, 'Twitter and Webpage':x&~y&z, 'Twitter, Facebook and Webpage':x&y&z}

# iterate through index names and different configurations
for k, v in bools.items():
    media_combo_df.loc[k] = df[lc&v]['party'].value_counts() # value counts for those in lab & con and that satisfy the combination of tw,fb,web
    media_combo_df.loc[k, 'Total'] = df[v]['party'].value_counts().sum() # sum of the total value counts for all parties

media_combo_df = media_combo_df.fillna(0) # fill the remaining nas with 0
display(media_combo_df)


Unnamed: 0,Labour Party,Conservative Party,Total
,29.0,0.0,549
Only Twitter,211.0,2.0,672
Only Facebook,2.0,0.0,65
Only Webpage,9.0,73.0,395
Twitter and Facebook,66.0,1.0,313
Facebook and Webpage,4.0,28.0,113
Twitter and Webpage,180.0,318.0,1064
"Twitter, Facebook and Webpage",88.0,209.0,800


In [174]:
# Extended exercise part 2. 

# The raw counts in the table are useful, 
# but showing the relative percentage would be even more useful. 
# Create a new table that is formatted like the above, however, in 
# this table show the percent of the column total. 
# So for Labour that would be the percentage of Labour candidates
# who had 'only webpage', not the percentage of all candidates who
# are Labour and only have a webpage. 

# Hint to display a DataFrame as a percentage, try this: 
# df = pd.DataFrame(pd.Series(range(10))/10,columns=["var1"])
# df['var2'] = df['var1'].map(lambda n: '{:,.1%}'.format(n))

# display(df)
# Answer below here

# convert to decimal
perc_df = media_combo_df/media_combo_df.sum() 

# convert to percentage
for i in perc_df:
    perc_df[i] = perc_df[i].map(lambda n: '{:,.1%}'.format(n))

display(perc_df)
# Reviwers comments below here




Unnamed: 0,Labour Party,Conservative Party,Total
,4.9%,0.0%,13.8%
Only Twitter,35.8%,0.3%,16.9%
Only Facebook,0.3%,0.0%,1.6%
Only Webpage,1.5%,11.6%,9.9%
Twitter and Facebook,11.2%,0.2%,7.9%
Facebook and Webpage,0.7%,4.4%,2.8%
Twitter and Webpage,30.6%,50.4%,26.8%
"Twitter, Facebook and Webpage",14.9%,33.1%,20.1%


In [176]:
# Extended exercise part 3. 

# Sum each of the columns in the previous exercise. 
# Do each of the columns sum to 100%? They should. 
# Use this exercise as a check that 
# each column sums to the expected total. 

# hint. 
# print(df["var1"].sum())

# Answer below here

for i in perc_df.columns:
    print(i, perc_df[i].apply(lambda x: float(x[:-1])).sum()) # remove '%' and convert string to float, then sum column

# discrepancy from 100% comes from rounding to 1 decimal place in previous cell

# Reviewers comments below here 




Labour Party 99.89999999999999
Conservative Party 100.0
Total 99.80000000000001
