<a href="https://colab.research.google.com/github/lcbjrrr/ProgWdata/blob/main/CSC116_StreamLit.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Streamlit



Streamlit is an open-source Python framework designed to help developers create interactive web applications for data science and machine learning projects. It allows users to build apps quickly with minimal code by converting Python scripts into web apps. Streamlit automatically handles UI elements like sliders, buttons, and charts, making it easy to visualize data and deploy machine learning models without requiring extensive web development knowledge. It is popular among data scientists for rapid prototyping and visualization.

In [1]:
!pip install streamlit

Collecting streamlit
  Downloading streamlit-1.50.0-py3-none-any.whl.metadata (9.5 kB)
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Downloading streamlit-1.50.0-py3-none-any.whl (10.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.1/10.1 MB[0m [31m54.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pydeck-0.9.1-py2.py3-none-any.whl (6.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m76.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pydeck, streamlit
Successfully installed pydeck-0.9.1 streamlit-1.50.0


In [2]:
!npm install localtunnel

[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K
added 22 packages in 4s
[1G[0K⠹[1G[0K
[1G[0K⠹[1G[0K3 packages are looking for funding
[1G[0K⠹[1G[0K  run `npm fund` for details
[1G[0K⠹[1G[0K

## Data Analysis: Bar chat

The dataset contains information about auto insurance across various states in the U.S. It includes details such as state abbreviations, capitals, insurance price levels, population, geographical coordinates (latitude and longitude), and regional divisions. The columns also contain yearly insurance premiums (for 2021, 2022, and 2023) for each state, and it is organized by regions like Northeast, Midwest, and South. This data can be useful for analysis of insurance pricing trends across different states and regions.

A bar chart is an effective way to compare numerical data across categories. In the example with the df dataset, using st.bar_chart(data=df, x='state', y='Y2023'), you can compare the 2023 insurance premiums (Y2023) for each state. The x-axis represents the states, while the y-axis shows the corresponding premium values. This visualization makes it easy to identify patterns, such as which states have the highest or lowest insurance premiums, helping in data-driven decision-making.

In [3]:
%%writefile bar.py
import streamlit as st
import pandas as pd
df=pd.read_csv('https://raw.githubusercontent.com/lcbjrrr/ProgWData/refs/heads/main/autoinsurance.csv')
st.write(df)
st.bar_chart(data=df,x='state',y='Y2023')

Writing bar.py


In [4]:
!streamlit run /content/bar.py t &>/content/logs.txt &


In [5]:
!curl https://loca.lt/mytunnelpassword


34.30.128.123

In [6]:
!npx localtunnel --port 8501 &


[1G[0K⠙[1G[0Kyour url is: https://slick-bobcats-move.loca.lt



## Aggregation



We can group our data by the 'Region' column to calculate the average insurance premium for 2023 (Y2023) in each region. The resulting average values are displayed in a table, and a bar chart visualizes the comparison of insurance premiums across different regions. This approach helps to summarize regional insurance trends efficiently, highlighting differences in premiums between regions.


In [7]:
%%writefile agg.py
import streamlit as st
import pandas as pd
df=pd.read_csv('https://raw.githubusercontent.com/lcbjrrr/ProgWData/refs/heads/main/autoinsurance.csv')
agg = df.groupby('Region')['Y2023'].mean()
agg = agg.reset_index()
st.write(agg)
st.bar_chart(data=df,x='Region',y='Y2023')

Writing agg.py


## Pie chart

A pie chart is ideal for showing proportions within a dataset. In this case, using agg = df.groupby('Region')['state'].count(), you can count the number of states in each region and visualize these counts as a pie chart. Each slice of the pie chart represents the proportion of states in each region, providing an easy-to-understand comparison of regional distribution. This helps to see how states are distributed across different regions in the dataset.


In [None]:
%%writefile pie.py
import streamlit as st
import pandas as pd
df=pd.read_csv('https://raw.githubusercontent.com/lcbjrrr/ProgWData/refs/heads/main/autoinsurance.csv')
agg = df.groupby('Region')['state'].count()
#agg = agg.reset_index()
st.write(agg)
st.pyplot(agg.plot(kind='pie').figure)

## Line chart

A line chart is useful for displaying trends over time. In this example, using st.line_chart(data=life, x='year', y=['own', 'dont_own']), you can plot the trend of two variables — 'own' and 'dont_own' — over the years. This allows for a clear visualization of how each variable changes over time, helping to identify patterns, fluctuations, or trends, such as whether more people started owning insurance over the years compared to those who don't own it.


In [10]:
%%writefile line.py
import streamlit as st
import pandas as pd
life = pd.read_csv("https://raw.githubusercontent.com/lcbjrrr/ProgWData/refs/heads/main/life_insurance.csv")
st.write(life)
st.line_chart(data=life,x='year',y=['own',	'dont_own'] )

Writing line.py


## Scatterplot

A scatterplot is effective for visualizing the correlation or association between two variables. In this example, using st.scatter_chart(data=auto, x='age', y='premium'), you can explore how the age of individuals correlates with their premium values. Each point on the scatterplot represents a data pair, allowing you to visually assess if there's a relationship, such as whether older individuals tend to have higher premiums, or if there is no noticeable trend.



In [11]:
%%writefile scatter.py
import streamlit as st
import pandas as pd
auto = pd.read_csv('https://raw.githubusercontent.com/lcbjrrr/ProgWdata/refs/heads/main/autos.csv')
st.write(auto)
st.scatter_chart(data=auto,x='age',y='premium' )

Writing scatter.py


## Histogram

A histogram is a type of bar chart used to represent the distribution of numerical data by grouping data points into bins. In the example auto['Y2023'].plot(kind='hist', bins=[0,500,800,1000,5000]), it shows how the insurance premiums (Y2023) are distributed across different ranges (or bins) such as 0-500, 500-800, etc. This visualization helps to understand the frequency of premiums in each range, allowing for quick insights into data distribution.


In [12]:
%%writefile hist.py
import streamlit as st
import pandas as pd
df=pd.read_csv('https://raw.githubusercontent.com/lcbjrrr/ProgWData/refs/heads/main/autoinsurance.csv')
st.write(auto)
st.pyplot(auto['Y2023'].plot(kind='hist',bins=[0,500,800,1000,5000]).figure)

Writing hist.py


## Boxplot

A boxplot is a graphical representation of the distribution of numerical data through quartiles. It shows the minimum, first quartile (Q1), median, third quartile (Q3), and maximum, highlighting the spread and skewness of the data. Outliers are typically marked as individual points outside the "whiskers." In the example auto['Y2023'].plot(kind='box'), the boxplot visualizes the distribution of the Y2023 insurance premiums, helping to quickly assess the data's range, central tendency, and any potential outliers.


In [13]:
%%writefile box.py
import streamlit as st
import pandas as pd
df=pd.read_csv('https://raw.githubusercontent.com/lcbjrrr/ProgWData/refs/heads/main/autoinsurance.csv')
st.write(auto['Y2023'].describe())
st.pyplot(auto['Y2023'].plot(kind='box').figure)

Writing box.py


## Maps

To work with map data in Streamlit, the st.map() function can be used to plot geographic data points on a map. The df dataframe contains latitude (lat) and longitude (lon) columns for geographic coordinates. The size parameter adjusts the size of the markers on the map. By calling st.map(df, latitude="lat", longitude="lon", size="2300"), the geographic data will be visualized on an interactive map, making it easier to explore location-based trends.


In [14]:
%%writefile map.py
import streamlit as st
import pandas as pd
df=pd.read_csv('https://raw.githubusercontent.com/lcbjrrr/ProgWData/refs/heads/main/autoinsurance.csv')
#st.write(df)
st.map(auto, latitude="lat", longitude="lon", size="2300")

Writing map.py
