## Design Process
Given the interactive and multiview nature of the visualization, the design process has been iterative and has included all the views of the visualization. From the start, we have tried to keep the design as simple as possible, avoiding unnecessary elements and focusing on clearly answering the questions.
With this objective in mind, we began by sketching the overall design of the visualization, which is shown in the following image. This first sketch focused on deciding how we could use the views which had been created for the first course assignment to answer the questions. The main idea was to use the map to show the location of the accidents (as dots) and the amount per borough through a choropleth. Furthermore, a bar chart included the accidents per vehicle type. The time series would be used to show the accidents per hour, and a bar chart would show the accidents per month. Finally, a lollipop chart would show the difference with respect to the mean of accidents per day depending on the weather condition. The view created for te first assignment is also included, to facilitate understanding the design.

![image-2.png](attachment:image-2.png)
![image.png](attachment:image.png)


Through interactions, such as being able to select bars, points and boroughs in the map, and an interval of time, we felt that we could answer some but not all the questions. However, we felt that before further refining the design, we should implement the basics of the visualization in order to validate weather, from a technical point of view, the design was feasible. It is worth noting that we knew that we would have to change the design, as we were not able to answer all the questions with the views we had created, especially the ones which involved selecting specific days of the week and weeks in a month. 


### First Prototypes
We began by implementing the visualizations independently of each other, having only interactions with themselves. This allowed us to more easily debug the problems and check the viability of the vis, as both Altair and Streamlit have known bugs and issues which limit some possibilities.

In the following sections, the design process for each view (without taking into account inter-view interactions) is described, together with the final design and analysis of the view.

#### Map 
We began by implementing the interactive map visualization. Our objectives were to create a visualization which used a choropleth to show the number of accidents per borough (using a `mark_geoshape` which encoded the count as color) and the position of each accident (using a `mark_point` encoding the coordinates) superposed on top of the choropleth. Furthermore, we wanted to be able to select a borough and a group of points, which would highlight them and update the other views so that they only showed the data corresponding to the selection.

However, we encountered several problems caused by the inner workings of Altair and an issue that Streamlit and Altair have with geodataframes. It is a known issue ([GitHub issue #1002](https://github.com/streamlit/streamlit/issues/1002)) that when rendering an Altair map chart in streamlit, a remote data source must be used and one can not use a geodataframe. This limits the ability of make an interactive choropleth due to various reasons:
- If the accident dataset is used as the data source and the geometry is looked up, streamlit gives an error due to the known bug.
- If the geometry is used as the data source, and the accident data is looked up it does not work properly as the lookup does not perform an inner join but is a one-sided join.

Therefore, it was decided to instead use a map containing the boroughs as a base layer, without encoding the count as color, and the locations of the accidents superposed over it. Furthermore, it was decided that the following interactions would be implemented:
- When one clicks one or more boroughs, the other borough's opacity is reduced to 0.2 and only those accidents which happened in the selected boroughs are shown.
- When one selects an area of the map, the selected accidents are highlighted and the other views are updated to only show the data corresponding to the selection.

Furthermore, in order to facilitate knowing the exact number of accidents in each borough, a bar chart was added to the map view. This bar chart shows the number of accidents per borough and is updated when a selection is made. It is worth noting that we did not use an interactive tooltip over the chart as it was not possible due to technical limitations.

The visualization is shown in the following cell:











In [30]:
mapa = get_map()
ny_df, bur = get_buroughs(mapa)
# normalize bur and accident data BoroName
bur = bur.reset_index()
accident_data.head()
# create properties.name column equal to BoroName
accident_data['properties.name'] = accident_data['BoroName']


accident_data['weekday'] = accident_data['date'].dt.weekday
# make weekend column
accident_data['weekend'] = accident_data['weekday'].apply(lambda x: 1 if x > 4 else 0)

# make column with week number
accident_data['week'] = accident_data['date'].dt.week
# month column
accident_data['month'] = accident_data['date'].dt.month

# for each month get the minimum week number
min_week = accident_data.groupby(['month'])['week'].min().reset_index()
# merge with accident data
accident_data = pd.merge(accident_data,min_week,on='month',how='left')
accident_data['week'] = accident_data['week_x'] - accident_data['week_y'] + 1



  ny_df["x"] = hex_buroughs.centroid.x

  ny_df["y"] = hex_buroughs.centroid.y
  accident_data['week'] = accident_data['date'].dt.week


In [32]:

colors = {"bg": "#eff0f3", "col1": "#d8b365", "col2": "#5ab4ac"}
w = 600
h = 400
ratio = 0.2
# accident_data = get_weather_data(data,fname = "weather.csv")
print(accident_data.columns)
selection_cond = alt.selection_multi(on="click", empty="all", fields=["conditions"])
selection_buro = alt.selection_point(fields=["properties.name"], empty="all")
month_dropdown = alt.binding_select(options=[[6,7,8],6,7,8])
selection_month = alt.selection_point(fields=['month'], bind=month_dropdown)

weekday_dropdown = alt.binding_select(options=[[0,1,2,3,4],[5,6]],name='weekday',labels=['weekday','weekend'])
selection_weekday = alt.selection_point(fields=['weekday'], bind=weekday_dropdown)

selection_day = alt.selection_point(fields=['weekday','week','month'])
# choropleth, _ = plot_map(hex_data, mapa, ny_df, bur,selection_buro,selection_cond)


Index(['index_left', 'CRASH DATE', 'CRASH TIME', 'BOROUGH', 'LATITUDE',
       'LONGITUDE', 'VEHICLE TYPE CODE 1', 'date', 'weekday', 'covid',
       'BoroName', 'index', 'BoroCode', 'Shape_Leng', 'Shape_Area', 'datetime',
       'conditions', 'properties.name', 'weekend', 'week_x', 'month', 'week_y',
       'week'],
      dtype='object')


In [35]:
w=1000
geo_view=plot_map(accident_data, selection_cond, selection_buro, selection_month, selection_weekday,w=w,ratio=.9)
weather = weather_chart(accident_data,selection_buro,selection_cond,selection_month,selection_weekday,w=w*0.8,ratio=0.8)
calendar = calendar_chart(accident_data,selection_buro,selection_cond,selection_month,selection_weekday,w=w*0.2)

# .configure_scale(
#     bandPaddingInner=.1).add_params(date_selector,month_selection, weekday_selection)

geo_view & (weather | calendar) 

In [202]:
# make column with weekday number
accident_data=get_weather_data(data,fname = "weather.csv")
accident_data['weekday'] = accident_data['date'].dt.weekday

# make column with week number
accident_data['week'] = accident_data['date'].dt.week
# month column
accident_data['month'] = accident_data['date'].dt.month

# for each month get the minimum week number
min_week = accident_data.groupby(['month'])['week'].min().reset_index()
# merge with accident data
accident_data = pd.merge(accident_data,min_week,on='month',how='left')

  accident_data['week'] = accident_data['date'].dt.week


In [4]:
from graphs import *

accident_data = get_accident_data("dataset_v1.csv")
ny = "https://raw.githubusercontent.com/pauamargant/VI_P1/main/resources/new-york-city-boroughs.geojson"
data_geojson_remote = alt.Data(
    url=ny, format=alt.DataFormat(property="features", type="json")
)

# make month dropdown
month_dropdown = alt.binding_select(
    options=[[6, 7, 8, 9], 6, 7, 8, 9],
    name="month",
    labels=["All", "June", "July", "August", "September"],
)
selection_month_dropdown = alt.selection_single(
    fields=["month"], bind=month_dropdown, name="month"
)

base = (
    alt.Chart(accident_data)
    .mark_geoshape()  # fill=colors["col3"]
    .properties(
        width=500,
        height=300,
    )
    # .transform_filter(selection_month_dropdown)
    .transform_lookup(
        lookup="name",
        from_=alt.LookupData(data_geojson_remote, "name"),
        as_="geom",
        default="Other",
    )
    .transform_calculate(geometry="datum.geom.geometry", type="datum.geom.type")
    .mark_geoshape()
    .project(type="albersUsa")
    .encode(
        # opacity=alt.condition(selection_buro, alt.value(0.6), alt.value(0.2)),
        color=alt.Color("name:N"),
        tooltip=["name:N"],
    )
    .interactive()
    # .add_params(selection_month_dropdown)
)


  ny_df["x"] = buroughs.centroid.x

  ny_df["y"] = buroughs.centroid.y
  exec(code_obj, self.user_global_ns, self.user_ns)
