![](src/images/mta_logo.png)

# Introduction to Streamlit Using Python

## Introduction

***
<div class="alert alert-block alert-info">
<b>Welcome!</b> 

In this course, we use Python to explore bus speeds and performance datasets from the <a href="https://new.mta.info/open-data">MTA's open data portal</a>. This will illustrate basic data-analysis tools with <a href="https://pandas.pydata.org/docs/user_guide/index.html">Python's pandas library</a>, data-visualization tools with Python's <a href="https://altair-viz.github.io/">Altair library</a>, and the website-creating capabilities of Python's <a href="https://streamlit.io/">streamlit library</a>. 

Python is the programming language we use to prepare data for and create <a href="metrics.mta.info">metrics.mta.info</a>, and what you'll use in this demonstration. Upon completion of the course, you will be able to use Python to process and analyze datasets, and create basic webpages visualizing those datases.

This document is a [Juypter notebook](https://docs.jupyter.org/en/latest/), integrating Python code and text (markdown) in different 'chunks.' You can open the document in VSCode, and hit the button to the left of code chunks to run them.

These instructions assume you have folllowed the setup steps in the <a href="https://github.com/nymta/Open-Data-Week-2023">GitHub README</a>. If you have not, please complete these steps before continuing.

</div>

*All rights reserved for course materials below*

### A little bit about our team

<p></p>
<div>
<p style="float: left; padding-right: 10px;"><img src="https://media.licdn.com/dms/image/C5603AQGUx9gG7_f9DQ/profile-displayphoto-shrink_800_800/0/1656627094442?e=1681344000&v=beta&t=NYXGQPzgxug4h2tyuJIxp7H42j_4xbNvFB_v4GMrxbc" height="100px" width="100px"></p>
<p> <b>Rahnuma Tarannum</b> <br>

Rahnuma is currently pursuing a Masters degree in Applied Urban Science and Informatics from New York University. In January of last year, Rahnuma started an internship as a Data Research Associate at MTA in the Data and Analytics Team. Working alongside colleagues from the Data and Analytic Reporting Team, Rahnuma has successfully developed a new and improved public dashboard (metrics.mta.info) to replace the old one on the MTA website. Presently, Rahnuma is involved in writing queries that will facilitate the updating of the open data portal and the addition of route-level information to the metrics website.

</p>
</div>

<p></p>
<div>
<p style="float: left; padding-right: 10px;"><img src="https://media.licdn.com/dms/image/C4E03AQGDEk3KoYVNjw/profile-displayphoto-shrink_800_800/0/1633185999002?e=1680739200&v=beta&t=Po5ZYz5JKkJD2DceUDr_80Fe-iKl9F8dI-VNnFrXVdY" height="100px" width="100px"></p>
<p> <b>Dan Powers</b> <br>

Dan works as a Senior Data Scientist in the MTA's Data & Analytics Team, joining the agency in October, 2023. As part of the reporting section of the D&A Team, Dan helps develop and maintain internal and external data tools built on Python, including MTA's external metrics website. Dan's work experiences prior to the joining the MTA include policy and data analysis work for the cities of Takoma Park and Boston, and the US Government Accountability Office.

</p>
</div>

### Outline

- [**Section 1**](#Section-1): Setup [5 min]
- [**Section 2**](#Section-2): Intro to Streamlit [15 mins]
  - Streamlit background
  - Running streamlit pages 
  - How to add webpage elements in streamlit
- [**Section 3**](#Section-4) [20 mins] 
  - Reading in data
  - Working with datetimes
  - Renaming columns
  - Column selection
  - Changing data types
  - Changing column types
  - Grouping data
- [**Section 4**](#Section-5): Streamlit inputs + page design, and Altair visualization [20 mins]
  - Background
  - Streamlit inputs
  - Filtering datasets
  - Organizing streamlit pages
  - Visualizing data in streamlit
  - Using `if else` statements for flexible webpage design
- [**Section 5**](#Section-6): Conclusion

## Section 1: Setup

Download the <a href="https://github.com/nymta/Open-Data-Week-2023">GitHub repository for this project</a> as a zip file.

![](src/images/clone-repo.PNG)

Extract the files in this folder to somewhere on your computer by hitting the "Extract all" button.

![](src/images/extract_button.PNG)

Open VSCode from Anaconda Navigator.

![](src/images/anaconda-navigator.PNG)

Click File -> Open Folder in VSCode, and select the folder associated with this project.

![](src/images/open-folder.PNG)

## Section 2: Intro to Streamlit

### Streamlit Background

We use Python's streamlit library to create <a href="https://metrics.mta.info/">MTA's public metrics website</a>. This library makes it easy to understand how your Python scripts translate to what you see in your web browser.

In this section we will breifly go over some of the functionalities of streamlit.

### Running Streamlit Pages

Lets start by opening a project terminal in VSCode.

![](src/images/new_terminal.PNG)

We have set up the <a href="https://github.com/nymta/Open-Data-Week-2023">Github repository</a> with scripts creating a basic streamlit website visualizing MTA bus data.

To see a webpage or website created by Python scripts that use streamlit, you run the `streamlit run path_to_pythonfile.py` command in the terminal. Individual scripts correspond to individual streamlit pages. If you structure a project with individual scripts in a `pages/` directory, and a main script in the top-level project directory, you can run the script in the top-level directory to [make a multipage app](https://docs.streamlit.io/library/get-started/multipage-apps) (like you see here with `app.py`).


To see the website created by the Python scripts in the `open-data-week-2023` folder, lets run `streamlit run app.py` in the terminal.

![](src/images/run_apppy.PNG)

You should see the following message appear in your terminal, and a webpage looking similar to this one should pop-up in your default browser:

![](src/images/terminal-message.PNG)

![](src/images/app-startup.PNG)

This means that the webpage is now deployed locally on your computer at the Local URL, and viewable by others online at the Network URL. If you don't see this, let us know and we can assist.

### Now lets use streamlit 🙌

For this part of the demo, we'll be working in the the `pages/bus_speeds.py` script. Open this script now from the explorer tab on the left hand side of the screen.

![](src/images/explorer.PNG)

Python scripts contain a series of Python commands to be executed in sequence, usually to accomplish a specific goal (think of a recipe). You can execute individual steps in the sequence by highlighting them and hitting lines of code and hitting `ctrl+enter`. Scripts usually import librarys, which contain collections of functions, classes, and objects that make accomplishing different tasks easier. After importing a library, you can access functions and objects within it.

You can import a library into a script by using an `import` function, and give it an alias to access functions within it with `as`.

In the `bus_speeds.py` script, we import streamlit first with `import streamlit as st`, and import pandas with `import pandas as pd`. After doing this, we can access streamlit functions and objects with `st.function_name`, and pandas functions and objects with `pd.function_name`.

Tip: Hit `ctrl+space` as you're typing to bring up autocomplete options and documentation of those options

In [1]:
import streamlit as st
import pandas as pd

2023-03-13 16:13:16.872 INFO    numexpr.utils: NumExpr defaulting to 8 threads.


### How to add webpage elements in streamlit

Streamlit functions typically create elements that display in your browser, and are often interactive. Several functions produce text in your browser, and can help organize a webpage.

There are many different ways to write text in streamlit.

To write titles, we can use:

``` python
st.title('Title name')

```

To write headers, we can use:


``` python
st.header('Header name')

``` 

To write body text:

``` python
st.write("body text goes here)

``` 

For example, `st.title('Insert Text')` produces:

![](attachment:image.png)

In [2]:
st.write(
    (
        "Insert Text"
        " Next sentence,"
        " Next sentence,"
        " Final sentence."
    )
)
st.write("")

2023-03-13 16:13:20.577 
  command:

    streamlit run c:\Users\1282536\Anaconda3\lib\site-packages\ipykernel_launcher.py [ARGUMENTS]


![](attachment:image.png)

## Section 3: Reading in and processing data

### Reading in data

MTA data can be found on the [New York State data portal](data.ny.gov). In this demonstration, we'll use [MTA Bus Speeds data from 2020 onwards](https://data.ny.gov/Transportation/MTA-Daily-Ridership-Data-Beginning-2020/vxuj-8kew). You can search for other MTA datasets on the catalog using the search button, or look in the MTA Open Data Catalog on the [MTA Open Data Program page](https://new.mta.info/open-data).

You have two options for reading in MTA datasets from the Open Data Portal:

- You can click the 'Export' button, and use the 'Copy link address' URL.

    ![](src/images/export-copylink.PNG)

- If you're working with real-time data, you may want to use the link address found under the 'API' button (which stands for Application Programming Interface). You can download data as a JSON file or CSV file. If you use the API link, you'll want to add `$limit=1000000` to the end of the URL or it will only read 1,000 rows of the data.

    ![](src/images/mta-api.PNG)

You can read the data in with the pandas `read_csv` function.

```python
#MTA Bus Speeds: Beginning 2020 data upload
data_url = "https://data.ny.gov/api/views/vxuj-8kew/rows.csv?accessType=DOWNLOAD&sorting=true"

df = pd.read_csv(data_url) #transform to pandas dataframe

```

To print any dataframe to the web browser, you can use streamlit. You can simply type:

`st.dataframe(data)`

### Working with datetimes

A [datetime object](https://docs.python.org/3/library/datetime.html#datetime-objects) can be reformatted and manipulated using functions built into the [`datetime` module](https://docs.python.org/3/library/datetime.html#). Datetime objects in a datetime column are usually printed in a similar format to this: `2022-02-12 08:56:32.005822`.

You can reformat datetime objects as string objects in different formats using the `datetimeobj.strftime(format)` method, where `format` is a string using [format codes for dates and times](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes). So if you wanted the above datetime object to appear as `February 12, in 2022`, you could use the following script:

In [3]:
# import datetime module
import datetime as dt

# create a datetime object with the dt.datetime() function
datetimeobj = dt.datetime(year = 2022, month = 2, day = 12, hour = 8, minute=56, second=32, microsecond=5822)

print(datetimeobj)

# print data type
print(type(datetimeobj))

print(datetimeobj)

# format the datetime object as a string
datetimeobj = datetimeobj.strftime('%B %d, in %Y')

print("Now after the transformaion:")

print(type(datetimeobj))

print(datetimeobj)

2022-02-12 08:56:32.005822
<class 'datetime.datetime'>
2022-02-12 08:56:32.005822
Now after the transformaion:
<class 'str'>
February 12, in 2022


In our case, we will be changing the month column to date_time. Pandas provides a useful `to_datetime` function that we can use to transform all elements in a pandas Series to datetimes (individual columns in pandas [DataFrames are series](https://pandas.pydata.org/docs/user_guide/dsintro.html)). The below function creates a new column in our `bus_speed` dataframe called 'month', created by converting the elements of the 'date' column to datetime objects.

`bus_speed['month'] = pd.to_datetime(bus_speed['date'])` 

Other functions in the `datetime` library enable more complex manipulatons with datetime objects. For instance, the `timedelta` module has functions to add and subtract dates and times.

### Renaming columns

To rename any columns we can use the [`rename` method of Python DataFrame classes](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rename.html). Using the `column` parameter lets you rename columns directly; you provide a [dictionary](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) with keys matching the column you want renamed, and values of the new columns.

In [4]:
import pandas as pd

# create pandas DataFrame
dfobject = pd.DataFrame({"col1" : range(1, 5),
                         "col2" : ["A", "B", "C", "D"]})

print(dfobject)

dfobject = dfobject.rename(columns={'col1': 'Number column', 'col2': 'String column'})

print(dfobject)

   col1 col2
0     1    A
1     2    B
2     3    C
3     4    D
   Number column String column
0              1             A
1              2             B
2              3             C
3              4             D


### Column selection

Each column in a DataFrame is a Series. As a single column is selected, the returned object is a pandas Series. An easy way to create a DataFrame is to provide a dictionary to `pd.DataFrame`, where each key is a column name, and each row is an element in equal-length lists corresponding to each column. Below is an example:

In [5]:
import pandas as pd

df = pd.DataFrame({
    'A' : ['one', 'two', 'three', 'four', 'five'],
    'B' : ['a', 'b', 'c', 'd', 'e'],
    'C' : ['1', '2', '3', '4', '5']
    }
)

df

Unnamed: 0,A,B,C
0,one,a,1
1,two,b,2
2,three,c,3
3,four,d,4
4,five,e,5


If you want to subset the data to specific sets of columns, you can use square brackets [] with a list of the the column names you want to limit the data to.

In [6]:
df[['A', 'B']]

Unnamed: 0,A,B
0,one,a
1,two,b
2,three,c
3,four,d
4,five,e


If you want to select an individual column (a pandas Series), you can list the string name of one column in the brackets.

In [7]:
df['C']

0    1
1    2
2    3
3    4
4    5
Name: C, dtype: object

### Changing data types

The ***dtypes*** attribute of a dataframe returns a Series with the data type of each column. The index of the Series is the original DataFrame’s columns. Columns with mixed types are represented with a dtype of 'object.'

In [8]:
df.dtypes

A    object
B    object
C    object
dtype: object

### Changing column types

Sometimes, we need to change the data type of data we read in. This can be done using `astype` method of pandas Series.

In [9]:
# here's another way to make a dataframe
df2 = pd.DataFrame(
    columns=['A','B'], 
    data=[
        ['1.1','1'],
        ['1.2','2'],
        ['1.3','3'],
        ['1.4','4'],
        ['1.5','5']])

df2

Unnamed: 0,A,B
0,1.1,1
1,1.2,2
2,1.3,3
3,1.4,4
4,1.5,5


It's generally a good practice to check the type of our data by looking at the `dtypes` attribute of dataframes.

In [10]:
df2.dtypes

A    object
B    object
dtype: object

In [11]:
df2['A'] = df2['A'].astype(float)
df2['B'] = df2['B'].astype(int)

df2.dtypes

A    float64
B      int32
dtype: object

### Grouping data

A DataFrame's `groupby()` method involves splitting the object, applying a function to each split, and combining the results. This can be used to quickly group large amounts of data and compute operations on these groups.

We can use group by to find averages and aggregate using callable, string, dict, or list of string/callables.

In [12]:
df3 = pd.DataFrame({'A': ['Cat', 'Cat', 'Dog', 'Dog'],
                        'B': [1, 2, 3, 4],
                        'C': [0.1,0.2,0.3,0.4]})

df3

Unnamed: 0,A,B,C
0,Cat,1,0.1
1,Cat,2,0.2
2,Dog,3,0.3
3,Dog,4,0.4


In [14]:
df_grp = df3.groupby('A').agg({'B': 'sum','C': 'mean'})
df_grp

Unnamed: 0_level_0,B,C
A,Unnamed: 1_level_1,Unnamed: 2_level_1
Cat,3,0.15
Dog,7,0.35


In [None]:
# there are also other ways to aggregate
df_grp2 = df3.groupby('A').agg(sum)

## Section 4: Streamlit inputs + page design, and Altair visualization

### Background

Streamlit has a number of functions that allow for flexible webpage designs and ways to organize information. A useful feature of Streamlit is after using `streamlit run pythonscript.py` to open streamlit in your browser, you can hit the re-run button in the top right of the screen to re-generate your script and see what changes are reflected on the webpage. This section is intended to be a helpful reference as we go through the `pages/bus_cjfm.py` script.

![](src/images/rerun.PNG)

In `pages/bus_cjfm.py`, we're going to explore different webpage design and visualization options using [Bus Customer Journey Focused Metrics Data](https://data.ny.gov/Transportation/MTA-Bus-Customer-Journey-Focused-Metrics-Beginning/wrt8-4b59) from 2020 onwards.

### Streamlit inputs

Virtually all streamlit functions make something appear in your browser. Many streamlit functions accept inputs from users, returning values that you can store in variables. Each time a user changes a value, the script re-runs from the beginning. You can use the values stored in these variables to create dynamic webpages, that let users filter, transform, and visualize datasets based on selections they make.

- `st.selectbox()` display a drop-down select box on the browser, showing a `list` of options you make available. It returns a `string` representing the selected value. `st.radio()` functions similarly, but shows radio buttons instead
- `st.multiselect()` functions similar to `st.selectbox`, but lets the users select multiple or no values. It returns a `list` representing the selected values. 
- `st.dateinput()` lets the user enter one or two dates depending on whether you provide one date to the `values` parameter, or a list of two dates. It returns a `date` object if you enter one date in the values parameter, or two dates if you enter two.
- `st.checkbox()` is a checkbox. It returns `True` if the user checks it, and `False` if it's not checked

There are many other input options in streamlit, including `st.form`, and `st.text_input`.

### Filtering datasets

You can filter datasets basd on inputs. The `.loc[]` method of pandas DataFrames makes it easy to filter DataFrames to rows matching conditions you specify, by providing a boolean (True/False) series of equal length to the dataframe (this sounds more complicated than it is; check the code below for some examples). You can also supply multiple conditions by wrapping each series in parentheses and using an and statement. The next code block illustrates some common types of filters you're likely to use in streamlit:

In [None]:
df3 = pd.DataFrame({'A': ['Cat', 'Cat', 'Dog', 'Dog', "Fish"],
                        'B': [1, 2, 3, 4, 6],
                        'C': [0.1,0.2,0.3,0.4, 0.6]})

# lets try one filter
filt1 = df3.loc[df3.B > 2]

print(filt1)

# a column filter
filt2 = df3.loc[df3["A"] == "Cat"]

print(filt2)

# an isin filter
filt3 = df3.loc[df3["A"].isin(['Cat', "Fish"])]

print(filt3)

# combining filters
filt4 = df3.loc[(df3["A"] == "Dog") & (df3["B"] == 4)]

print(filt4)

      A  B    C
2   Dog  3  0.3
3   Dog  4  0.4
4  Fish  6  0.6
     A  B    C
0  Cat  1  0.1
1  Cat  2  0.2
      A  B    C
0   Cat  1  0.1
1   Cat  2  0.2
4  Fish  6  0.6
     A  B    C
3  Dog  4  0.4


### Organizing streamlit pages

In addition to the text-writing functions we went over earlier, streamlit has a number of functions for organizing the appearance of a webpage. These can take advantage of `with` notation; we won't go over the behind-the-scenes of this too much, but essentially, after initializing an organization element, if you write `with element:`, everything indented under that element will appear in it.

Streamlit layout options include:

- `st.columns()`: Creates columns across a page. You can either provide a number which creates that number of equal-width columns, or a list of widths which sizes the columns relative to the width (so `st.columns([1, 1, 2])` sizes the third column twice as wide as the first two).
- `st.tabs()`: Creates tabs that only display content within the tab if the tab is selected. You input a list of names for each tab
- `st.expander()`: Creates an expander. If the user clicks on it, it unfolds revealing the elements inside.
- `st.sidebar`: Creates a sidebar on the page

In [None]:
import streamlit as st

# the comma here creates two objects - column 1, and column 2
col1, col2 = st.columns(2)

with col1:
    st.write("I'm in column 1")

with col2:
    st.write("I'm in column 2")

with st.expander("Expander label"):

    st.write("I appear if you unfold the expander")

    st.write("So do I")

tab1, tab2 = st.tabs(["First tab", "Second tab"])

with tab1:
    st.write("I appear in the first tab")

with tab2:
    st.write("I appear in the second tab")
    
with st.sidebar:
    st.write("I appear in the sidebar")


### Visualizing data in streamlit

The [Altair library](https://altair-viz.github.io/) makes it easy to make interactive visualizations, and streamlit's `st.altair_chart()` function lets you display them.

The [typical steps](https://altair-viz.github.io/getting_started/starting.html) to create an Altair visualization are

1) Create a Chart object from a dataframe with `alt.Chart(dataframe)`
2) Identify how the data should be visualized with a `Chart.mark_*()` method (e.g., `Chart.mark_line()`, `Chart.mark_bar()`)
3) Map dataframe columns to features with the `Chart.mark_*.encode()` method

In [None]:
import altair as alt

df3 = pd.DataFrame({'A': ['Cat', 'Cat', 'Dog', 'Dog', "Fish"],
                        'B': [1, 2, 3, 4, 6],
                        'C': [0.1,0.2,0.3,0.4, 0.6]})

alt.Chart(df3).mark_bar().encode(
    x = 'A',
    y = 'B'
)

You can use the `color` parameter to show different groups on your visualization. Often, Altair will have functions corresponding to individual parameters that let you define options more specifically (e.g., `alt.X()` and `alt.Y()` for the `x` and `y` parameters of the `encode()` method). You can use the `properties` and `configure` methods of charts to make further specifications, and adding the `interactive()` method at the end makes the chart display interactively.

In [None]:
alt.Chart(df3).mark_point().encode(
    x = alt.X(field='B', type = "ordinal", title = 'Time'),
    y = alt.Y(
        field = 'C', 
        type='quantitative', 
        title = 'Proportion', 
        # range of y axis
        scale = alt.Scale(domain=[0, 1]),
    ),
    # pop up text when you hover over
    tooltip = [
        alt.Tooltip('C'), 
        alt.Tooltip('A')
    ],
    color= alt.Color(field='A', title = 'Animal', type='nominal'),
    shape = 'A'
).properties(
    title = ("Proportions of animals at different times of day (????)")
).interactive()

### 

### Using `if` `else` statements for flexible webpage design

What if you only want code to run under certain conditions? Or you want some lines of code to run under one set of conditions, and other code to run under other conditions? Using `if`, `elif`, and `else` statements, you can write code--and in turn, design webpages--that account for the different circumstances you might want to allow for.

`if` and `elif` statements use a syntax of:

```python
if boolean_condition:
    code_executed_if_that_condition_is_True
```

`else` must come after an `if` or `elif` statement, and does not have a condition (because the implicit condition is the failure of the previous conditions). Python does not *require* an `else` or `elif` statement after an `if` statement; they're just available if you need them.

Mixing these statements with streamlit functions can help you make webpages that break less often, and that adapt to the data your user is working with.

In [None]:
df3 = pd.DataFrame({'A': ['Cat', 'Cat', 'Dog', 'Dog', "Fish"],
                        'B': [1, 2, 3, 4, 6],
                        'C': [0.1,0.2,0.3,0.4, 0.6],
                        'D' : None})

selectcol = st.selectbox("Select which column you want to visualize",
             df3.columns)

if selectcol == "A":
    
    categories = df3[selectcol].unique()

    st.write(f"Unique values in column A are {selectcol}")

elif selectcol == "B":

    sumval = df3[selectcol].sum()

    st.write(f"The sum of column B is {sumval}")

elif selectcol == "C":
    meanval = df3[selectcol].mean()

    st.write(f"The mean of column C is {meanval}")

else:

    st.write("Column D is empty")


## Section 5: Conclusion

Thank you all for attending today's session. If you have any feedback or questions, we can be reached at [rahnuma.tarannum@nyct.com](rahnuma.tarannum@nyct.com) and [dan.powers@mtahq.org](dan.powers@mtahq.org). If you have questions about open data at the MTA, please contact [opendata@mtahq.org](opendata@mtahq.org).