In [1]:
import pandas as pd
import altair as alt

1. A list of the tasks you were initially given by the “users”
- Which squirrels engage in more than one activity?
- Are there any squirrels who are noted as being skinny? If so, in which parks? Which squirrels?
- What kind of temperature and weather environment do squirrels prefer?
- Do parks conditions influence squirrels activities?
- Which parks have the most gray squirrels?
- Which park has the most squirrels sightings?
- Which activity is observed the most in squirrels across parks?
- In what park will we most likely encounter friendly squirrels that will approach people. More specifically, which park has the highest proportion of squirrels that are friendly/approach people?

2. The intended audience (as specified by the “users”)

The targeted audiences are: (1) Park managers: they may be interested in knowing which parks have the highest numbers of (gray) squirrels and make appropriate adjustment to manage the park's ecosystem and wildlife. (2) Wildlife researchers: They may be interested in the distribution of  squirrels across different parks as part of their study. (3) Public/Tourists: People who are interested in wildlife or in squirrels sightings may find these information interesting or useful. (4) Journalists/Photographers: Journalists covering wildlife or environmental issues may use this as part of their reporting.


3. Which tasks you chose to prioritize for your design and how your design helps the user complete these tasks.
We chose the following four tasks:
- Task 1: Which parks have the most gray squirrels? 
- Task 2: Which park has the most squirrels sightings?
- Task 3: Which activity is observed the most in squirrels across parks?
- Task 4: In what park will we most likely encounter friendly squirrels that will approach people. More specifically, which park has the highest proportion of squirrels that are friendly/approach people?


4. The initial sketches you created (all sketches from the Sketch stage, group based on task, not by individual, specify which individual created each sketch)
- Task 1: 
Edison Le:

![Alt text](Task1EL.png)
 
- Task 2:
Edison Le:

![Alt text](Task2EL.png)

- Task 3: 
Edison Le:

![Alt text](Task3EL.png)
 
- Task 4:
Edison Le:
![Alt text](Task4EL.png)
 
5. The favorite sketch and why it was selected (1 paragraph as mentioned in Winnowing part of Decide stage)

- For task 1, all members had bar chart. The reason why a bar chart is a good choice here is we are comparing the number of gray squirrels in all parks. The bar chart shows the data as bars of equal width with the height of each are proportional to the value being represented, in this case, the number of gray squirrels. Here, we are comparing the number of gray squirrels in each park, and a bar chart, which uses common axis, makes it easy to see which parks have the most simply by looking at the heights of the bars, it has great accuracy. Additionally, bar charts are simple and easy to understand, making them a great choice for communicating the data to a wide range of audiences.

- For task 2, all members had bar chart. The rationale behind using a bar chart is similar to the previous one. A bar chart is an appropriate way to compare data, in this case, the number of squirrel sightings in different parks. The bars in the chart have equal width, and their heights are proportional to the value being represented, i.e., the squirrel sightings. By having common axis and comparing the heights of the bars, it is easy to see which parks have the most squirrel sighting, and the accuracy is excellent. Bar charts are also easy to comprehend, making them an ideal choice for communicating data to a broad audience.

- For task 3, all members had bar chart.  The reason why we chose bar chart is because it can show the frequency or count of each activity observed in squirrels across parks. The x-axis can represent the count or frequency of each activity observed, while the y-axis can represent different activities such as eating, running, climbing...etc. A bar chart is highly precise, allowing viewers to quickly compare and identify the most observed activity in squirrels across parks. Additionally, its ease of use makes it an excellent choice for the general public to comprehend.

- For task 4, all members had normalized stacked bar chart. Using a stacked bar chart is a suitable option as it enables us to visually compare the proportions of friendly squirrels across parks. It provides an effortless way to determine the park with the highest proportion of friendly squirrels through a clear visual representation. The chart is easily comprehensible to people of all backgrounds as it has a clear and straightforward design. The x-axis represents the proportion of friendly squirrels, while the y-axis displays the different parks.

In [2]:
data = pd.read_csv('squirrel-data.csv', encoding='ISO-8859-1')

In [3]:
data.shape
data.info()
data.head(5)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 433 entries, 0 to 432
Data columns (total 16 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   Area Name                        433 non-null    object 
 1   Area ID                          433 non-null    object 
 2   Park Name                        433 non-null    object 
 3   Park ID                          433 non-null    int64  
 4   Squirrel ID                      433 non-null    object 
 5   Primary Fur Color                432 non-null    object 
 6   Highlights in Fur Color          339 non-null    object 
 7   Color Notes                      10 non-null     object 
 8   Location                         399 non-null    object 
 9   Above Ground (Height in Feet)    112 non-null    object 
 10  Specific Location                89 non-null     object 
 11  Activities                       378 non-null    object 
 12  Interactions with Huma

Unnamed: 0,Area Name,Area ID,Park Name,Park ID,Squirrel ID,Primary Fur Color,Highlights in Fur Color,Color Notes,Location,Above Ground (Height in Feet),Specific Location,Activities,Interactions with Humans,Other Notes or Observations,Squirrel Latitude (DD.DDDDDD),Squirrel Longitude (-DD.DDDDDD)
0,UPPER MANHATTAN,A,Fort Tryon Park,1,A-01-01,Gray,White,,Ground Plane,,,Foraging,Indifferent,,40.85941,-73.933936
1,UPPER MANHATTAN,A,Fort Tryon Park,1,A-01-02,Gray,White,,Ground Plane,,,Foraging,Indifferent,Looks skinny,40.859436,-73.933937
2,UPPER MANHATTAN,A,Fort Tryon Park,1,A-01-03,Gray,White,,Ground Plane,,,"Eating, Digging something",Indifferent,,40.859416,-73.933894
3,UPPER MANHATTAN,A,Fort Tryon Park,1,A-01-04,Gray,White,,Ground Plane,,,Running,Indifferent,,40.859418,-73.933895
4,UPPER MANHATTAN,A,Fort Tryon Park,1,A-01-05,Gray,Cinnamon,,Ground Plane,,,"Running, Eating",Indifferent,She left food,40.859493,-73.93359


In [7]:
#Q: Which parks have the most gray squirrels?
gray = data.loc[(data['Primary Fur Color'] == 'Gray')]
park_counts_gray = gray.groupby('Park Name').size().reset_index(name='Count')
park_counts_gray['Count_str'] = park_counts_gray['Count'].astype(str)

color_scale = alt.Scale(
    domain=list(park_counts_sightings['Count']),
    type='linear',
    range=['#9ba2ab','#85898f'],
    reverse=True
)
#Remove color channel 

alt.Chart(park_counts_gray).mark_bar().encode(
    x=alt.X('Count:Q', title='Number of Gray Squirrels'),
    y=alt.Y('Park Name:N', sort='-x', title='Park Name'),              ###
    tooltip=[alt.Tooltip('Park Name:N', title='Park'), alt.Tooltip('Count:Q', title='Count')]
).properties(
    title={
        'text': 'Number of Gray Squirrels in Each Park',
        'fontSize': 20,
        'font': 'Helvetica'
    }
).configure_axis(
    gridDash=(1, 1),
    domain=False
).configure_text(
    fontSize=12,
    font='Helvetica'
)


In [9]:
#Q: Which park has the most squirrel sightings?

park_counts_sightings = data.groupby('Park Name').size().reset_index(name='Count')
park_counts_sightings['Count_str'] = park_counts_sightings['Count'].astype(str)

color_scale = alt.Scale(
    domain=list(park_counts_sightings['Count']),
    type='linear',
    range=['#3578c4', '#3560c4'],
    reverse=True
)
#remove color channel
alt.Chart(park_counts_sightings).mark_bar().encode(
    y = alt.Y('Park Name:N', sort='x', title='Park Name'),
    x = alt.X('Count:Q', title = 'Squirrel Sightings'),           ###
    tooltip=[alt.Tooltip('Park Name:N', title='Park'), alt.Tooltip('Count:Q', title='Count')]
).properties(
    title={
        'text': 'Number of Squirrel Sightings in Each Park',
        'fontSize': 20,
        'font': 'Helvetica'
    }
)

In [15]:
#Q: Which activity is observed the most in squirrels acorss parks?
unique_activities = data['Activities'].unique()
# print(unique_activities)


replace_dict = {
    '.*Chasing.*': 'Chasing',
    '.*Climbing.*': 'Climbing',
    '.*Running.*' : 'Running',
    '.*Chill.*' : 'Chilling',
    '.*Lounging.*' : 'Chilling',
    '.*Rest.*' : 'Chilling',
    '.*Eating.*' : 'Eating',
    '.*Hang.*' : 'Hanging',
    '.*Sitting.*' : 'Sitting',
    'Sleeping.*' : 'Sleeping',
    '.*Foraging.*' : 'Foraging',
    '.*Jump.*' : 'Jumping',
    '.*Prancing.*' : 'Jumping',
    '.*Nesting.*' : 'Nesting',
    '.*Snack.*' : 'Eating',
    '.*Digging.*' : 'Digging',
    '.*Burying.*' : 'Digging',
    '.*shouting.*' : 'Shouting',
    '.*Vocal.*' : 'Shouting',
    '.*Scratch.*' : 'Scratching',
    '.*scratch.*' : 'Scratching',
    '.*Cleaning.*' : 'Cleaning',
    '.*Defend.*' : 'Guarding',
    '.*battery.*' : 'Other',
    '.*Very.*' : 'Other',
    '.*Sticking.*' : 'Other',
    '.*Hanging.*' : 'Chilling',
    '.*Frolicking.*' : 'Other',
    '.*Posing.*' : 'Other',
}

data['Activities'] = data['Activities'].replace(replace_dict, regex=True)


park_activity_prop = data.groupby(['Park Name', 'Activities']) \
                         .size().reset_index(name='count') \
                         .assign(prop=lambda df: df['count'] / df.groupby('Park Name')['count'].transform('sum'))



## After looking at the stacked bar chart, I gotta say it's pretty messy. So I've added a regular bar chart below. Decide which one and then submit. -Minghao

In [16]:
act_counts = data.groupby('Activities').size().reset_index(name='Count')
act_counts['Count_str'] = act_counts['Count'].astype(str)
alt.Chart(act_counts).mark_bar().encode(
    y = alt.Y('Activities:N', sort='x', title='Activity'),
    x = alt.X('Count:Q', title = 'Count across parks'),                ###
    tooltip=[ alt.Tooltip('Count:Q', title='Count')]
).properties(
    title={
        'text': 'Number of Activities in Squirrels',
        'fontSize': 20,
        'font': 'Helvetica'
    }
)

In [22]:
#Q: In what park will we most likely encounter friendly squirrels that will approach people.
#More specifically, which park has the highest proportion of squirrels that are friendly/approaches people?

unique_interact = data['Interactions with Humans'].unique()
# print(unique_interact)

replace_dict2 = {
    '.*Indifferent.*' : 'Not Friendly',
    'Runs From' : 'Not Friendly',
    '.*Watches us from tree.*' : 'Not Friendly',
    'Runs From, watchful' : 'Not Friendly',
    '.*Not Friendly.*' : 'Not Friendly',
    'Defensive' : 'Not Friendly',
    'Watching' : 'Not Friendly',
    'Staring' : 'Not Friendly',
    '.*Skittish.*' : 'Not Friendly',
    '.*Cautious.*' : 'Not Friendly',
    '.*interested.*' : 'Friendly',
    '.*Approaches.*' : 'Friendly',
    'Okay with people' : 'Friendly',
    'Approaches, Runs From' : 'Friendly',
    '.*Interested.*' : 'Friendly',
    'Not Friendly, watchful' :   'Not Friendly',
    'Not Friendly, watches us in short tree ' :   'Not Friendly',
}

data['Interactions with Humans'] = data['Interactions with Humans'].replace(replace_dict2, regex=True)

park_interaction = data.groupby(['Park Name', 'Interactions with Humans']) \
                         .size().reset_index(name='count') \
                         .assign(prop=lambda df: df['count'] / df.groupby('Park Name')['count'].transform('sum'))

colors = ['#36cf45', '#cf3645']

alt.Chart(park_interaction).mark_bar().encode(
    y=alt.Y('Park Name:N', sort='-y', title='Park Name'),
    x=alt.X('prop:Q', stack='normalize', axis=alt.Axis(format='%'), title='Proportion of Activties'),
    color=alt.Color('Interactions with Humans:N', scale=alt.Scale(range=colors)),            ###
    tooltip=['Park Name', 'Interactions with Humans', 'count']
).properties(
    title={
        'text': 'Proportion of Activities in Each Park',
        'fontSize': 20,
        'font': 'Helvetica'
    }
)

In [23]:
data['Interactions with Humans'].value_counts()

Not Friendly    302
Friendly         40
Name: Interactions with Humans, dtype: int64