# Exploring the Babbage Soiree dataset

Some questions for thinking about the visualization bits: 
- How many years does this span? 
- How many events do we have only the year for? 
- What is the greatest number of events that occurred in one year? 
- How many people attended (in total)? How many came to a given party? 
- What is the balance of men and women overall? At a given party? 
- Do we have any sense of connectedness of the partygoers? Who was coming together? (co-attendance)
- When in someone's life were they attending these parties? What other things were happening in their lives to make them stop attending these parties? 


In [41]:
# install libraries 
import pandas as pd
import altair as alt 

In [42]:
# read in file - what does it look like?
parties = pd.read_csv("parties.csv")
parties.head()

Unnamed: 0,date,guest,qid,certainty_P1480,sourceID,pages,quote
0,1835-05,Sylvain Van de Weyer,Q546727,,Morgan1863,299,Babbage’s party last night very pleasant; got ...
1,1835-05,Sydney Morgan,Q459275,,Morgan1863,299,Babbage’s party last night very pleasant; got ...
2,1845-04-26,Frances Joanna Bunbury,Q27255564,,Bunbury2011a,59,April 26th. We went to Babbage's evening party...
3,1845-04-26,Charles Bunbury,Q1063834,,Bunbury2011a,59,April 26th. We went to Babbage's evening party...
4,1847-04-24,Arethusa Milner Gibson,Q18810603,,Bunbury2011b,228,Babbage's evening party and enjoyed it much. T...


In [43]:
# a bunch of stuff to make the dates usable 

parties_ = parties['date'].str.split(
    '-', 
    expand=True
    ).rename(columns={
        0:'year', 
        1:'month', 
        2:'day'})

# dummy month-year: 01-01
parties_.fillna("01", inplace=True)

parties_["date-imputed"] = parties_["year"] + "-" + parties_["month"] + "-" + parties_["day"]

parties = parties.join(parties_["date-imputed"])

parties["date-imputed"] = pd.to_datetime(parties["date-imputed"], format="%Y-%m-%d")
parties.info()





<class 'pandas.core.frame.DataFrame'>
RangeIndex: 124 entries, 0 to 123
Data columns (total 8 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   date             124 non-null    object        
 1   guest            124 non-null    object        
 2   qid              124 non-null    object        
 3   certainty_P1480  12 non-null     object        
 4   sourceID         124 non-null    object        
 5   pages            106 non-null    object        
 6   quote            124 non-null    object        
 7   date-imputed     124 non-null    datetime64[ns]
dtypes: datetime64[ns](1), object(7)
memory usage: 7.9+ KB


## How many parties did Babbage have, and how many people were in attendance at these? 

In [54]:
parties["date-imputed"].nunique() 

max(parties["date-imputed"])-min(parties["date-imputed"]) 
6763/365 

parties["guest"].nunique()



77

Babbage hosted 32 parties over about 18.5 years, with a total of 77 different guests 

In [45]:
# the parties weren't huge...
alt.Chart(parties).mark_bar().encode(
    alt.X("year(date-imputed):T"),
    alt.Y("count(guest)")
)

His parties were relatively intimate -- the biggest parties we see had 25 attendees, in 1839 and 1840.

In [52]:
# try breaking it out by month, but then we need to remember that anything Jan 01 is a dummy date...
alt.Chart(parties).mark_bar().encode(
    alt.X("year(date-imputed):T"),
    alt.Y("count(guest)"),
    color=alt.Color("month(date-imputed):O"),
    tooltip=["date-imputed", "count(guest)"]
)

## Who came most often to these parties? 

In [47]:
alt.Chart(parties).mark_circle().encode(
    alt.X("guest:N").sort("-size"),
    size="count(date-imputed)",
    color="count(date-imputed):O",
    tooltip=["count(date-imputed)"]
)

Ada Lovelace was the most frequent guest, with 8 parties attended. Stanley Morgan and Samuel Rogers were the next most frequent guest with 6 visits each. 

In [48]:
alt.Chart(parties, title='Babbage Soiree Guests').mark_circle(size=50).encode(
    x="yearmonth(date-imputed):T",
    yOffset="jitter:Q",
    color=alt.Color('yearmonth(date-imputed):N').legend(None),
    tooltip=["guest"]
).transform_calculate(
    # Generate Gaussian jitter with a Box-Muller transform
    jitter="sqrt(-2*log(random()))*cos(2*PI*random())"
).properties(width=1100,height=100)

In [49]:
alt.Chart(parties, title='Babbage Soiree Guests').mark_circle().encode(
    x="yearmonth(date-imputed):T",
    yOffset="jitter:Q",
    color=alt.condition(
        alt.datum.guest == "Ada Lovelace", 
        alt.value("darkorange"),
        alt.value("lightgray")),
    size=alt.condition(
        alt.datum.guest == "Ada Lovelace", 
        alt.value(80),
        alt.value(30)),
    tooltip=["guest", "date-imputed"]
).transform_calculate(
    # Generate Gaussian jitter with a Box-Muller transform
    jitter="sqrt(-2*log(random()))*cos(2*PI*random())"
).properties(width=1200,height=90)

In [50]:
alt.Chart(parties, title='Babbage Soiree Guests').mark_line().encode(
    x="yearmonthdate(date-imputed):T",
    yOffset="jitter:Q",
    color=alt.Color('guest:N').legend(None),
    tooltip=["guest", "date-imputed"]
).transform_calculate(
    # Generate Gaussian jitter with a Box-Muller transform
    jitter="sqrt(-2*log(random()))*cos(2*PI*random())"
).properties(width=1200,height=90)

In [51]:
search_input = alt.selection_point(
    fields=['guest'],
    empty=False,  # Start with no points selected
    bind=alt.binding(
        input='search',
        placeholder="Guest name",
        name='Search ',
    )
)

alt.Chart(parties, title='Babbage Soiree Guests').mark_circle().encode(
    x="yearmonth(date-imputed):T",
    yOffset="jitter:Q",
    color=alt.condition(
        search_input, 
        alt.value("darkorange"),
        alt.value("lightgray")),
    size=alt.condition(
        search_input, 
        alt.value(80),
        alt.value(30)),
    tooltip=["guest", "date-imputed"]
).transform_calculate(
    # Generate Gaussian jitter with a Box-Muller transform
    jitter="sqrt(-2*log(random()))*cos(2*PI*random())"
).add_params(
    search_input
).properties(width=1200,height=90)

## More ideas to think about....
Temporal event-based data 
- Check out Adrienko's work, Silvia Miksch's book on time oriented data 
- "life lines"/geneology historic representations but instead of marrying have people just meeting -- every individual is a line traveling through time -- can we get some context on what they were doing in their lives outside of these parties? e.g. why did Ada stop attending in 1839?
Each node is a party
- Encode area to number of party attendees 
- Encode portions of node to M vs F 
"centrality" of each person in chart 
- expertise of the person? e.g., natural sciences, linguistics
- more time stats on the person - birth, death, major life events (when did they become really famous)? 
