<a href="https://colab.research.google.com/github/lcbjrrr/quantai/blob/main/A4_Py_Proc_Vis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Streamlit



Streamlit is an open-source Python framework designed to help developers create interactive web applications for data science and machine learning projects. It allows users to build apps quickly with minimal code by converting Python scripts into web apps. Streamlit automatically handles UI elements like sliders, buttons, and charts, making it easy to visualize data and deploy machine learning models without requiring extensive web development knowledge. It is popular among data scientists for rapid prototyping and visualization.

In [None]:
!pip install streamlit

In [None]:
!npm install localtunnel

The dataset appears to contain information about auto insurance across various states in the U.S. It includes details such as state abbreviations, capitals, insurance price levels, population, geographical coordinates (latitude and longitude), and regional divisions. The columns also contain yearly insurance premiums (for 2021, 2022, and 2023) for each state, and it is organized by regions like Northeast, Midwest, and South. This data can be useful for analysis of insurance pricing trends across different states and regions.

# Bar plot

A bar chart is an effective way to compare numerical data across categories. In the example with the df dataset, using st.bar_chart(data=df, x='state', y='Y2023'), you can compare the 2023 insurance premiums (Y2023) for each state. The x-axis represents the states, while the y-axis shows the corresponding premium values. This visualization makes it easy to identify patterns, such as which states have the highest or lowest insurance premiums, helping in data-driven decision-making.

In [1]:
%%writefile bar.py
import streamlit as st
import pandas as pd
df=pd.read_csv('https://raw.githubusercontent.com/lcbjrrr/quantai/refs/heads/main/datasets/autoinsurance.csv')
st.write(df)
st.bar_chart(data=df,x='state',y='Y2023')

Writing bar.py


In [2]:
!streamlit run /content/bar.py t &>/content/logs.txt &

In [3]:
!curl https://loca.lt/mytunnelpassword

34.16.192.201

In [6]:
!npx localtunnel --port 8501 &

[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K[1mnpm[22m [33mwarn[39m [94mexec[39m The following package was not found and will be installed: localtunnel@2.0.2
[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0Kyour url is: https://all-bears-bet.loca.lt


# Aggregation

We can group our data by the 'Region' column to calculate the average insurance premium for 2023 (Y2023) in each region. The resulting average values are displayed in a table, and a bar chart visualizes the comparison of insurance premiums across different regions. This approach helps to summarize regional insurance trends efficiently, highlighting differences in premiums between regions.

In [1]:
%%writefile agg.py
import streamlit as st
import pandas as pd
df=pd.read_csv('https://raw.githubusercontent.com/lcbjrrr/quantai/refs/heads/main/datasets/autoinsurance.csv')
agg = df.groupby('Region')['Y2023'].mean()
agg = agg.reset_index()
st.write(agg)
st.bar_chart(data=df,x='Region',y='Y2023')


Writing agg.py


In [2]:
!streamlit run /content/bar.py t &>/content/logs.txt &

In [4]:
!curl https://loca.lt/mytunnelpassword

34.16.192.201

In [5]:
!npx localtunnel --port 8501 &

[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K[1mnpm[22m [33mwarn[39m [94mexec[39m The following package was not found and will be installed: localtunnel@2.0.2
[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0Kyour url is: https://lazy-yaks-like.loca.lt


# Pie chart


A pie chart is ideal for showing proportions within a dataset. In this case, using agg = df.groupby('Region')['state'].count(), you can count the number of states in each region and visualize these counts as a pie chart. Each slice of the pie chart represents the proportion of states in each region, providing an easy-to-understand comparison of regional distribution. This helps to see how states are distributed across different regions in the dataset.

In [1]:
%%writefile pie.py
import streamlit as st
import pandas as pd
df=pd.read_csv('https://raw.githubusercontent.com/lcbjrrr/quantai/refs/heads/main/datasets/autoinsurance.csv')
agg = df.groupby('Region')['state'].count()
#agg = agg.reset_index()
st.write(agg)
st.pyplot(agg.plot(kind='pie').figure)

Writing pie.py


In [2]:
!streamlit run /content/bar.py t &>/content/logs.txt &

In [3]:
!curl https://loca.lt/mytunnelpassword

34.145.148.40

In [4]:
!npx localtunnel --port 8501 &

[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K[1mnpm[22m [33mwarn[39m [94mexec[39m The following package was not found and will be installed: localtunnel@2.0.2
[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0Kyour url is: https://few-nails-write.loca.lt


# Line Chart

A line chart is useful for displaying trends over time. In this example, using st.line_chart(data=life, x='year', y=['own', 'dont_own']), you can plot the trend of two variables — 'own' and 'dont_own' — over the years. This allows for a clear visualization of how each variable changes over time, helping to identify patterns, fluctuations, or trends, such as whether more people started owning insurance over the years compared to those who don't own it.

In [1]:
%%writefile line.py
import streamlit as st
import pandas as pd
life = pd.read_csv("https://raw.githubusercontent.com/lcbjrrr/quantai/refs/heads/main/datasets/life_insurance.csv")
st.write(life)
st.line_chart(data=life,x='year',y=['own',	'dont_own'] )

Writing line.py


In [2]:
!streamlit run /content/bar.py t &>/content/logs.txt &

In [3]:
!curl https://loca.lt/mytunnelpassword

35.245.34.90

In [4]:
!npx localtunnel --port 8501 &

[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K[1mnpm[22m [33mwarn[39m [94mexec[39m The following package was not found and will be installed: localtunnel@2.0.2
[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0Kyour url is: https://stale-apes-sink.loca.lt


# Scatterplot

A scatterplot is effective for visualizing the correlation or association between two variables. In this example, using st.scatter_chart(data=auto, x='age', y='premium'), you can explore how the age of individuals correlates with their premium values. Each point on the scatterplot represents a data pair, allowing you to visually assess if there's a relationship, such as whether older individuals tend to have higher premiums, or if there is no noticeable trend.

In [1]:
%%writefile scatter.py
import streamlit as st
import pandas as pd
auto = pd.read_csv('https://raw.githubusercontent.com/lcbjrrr/quantai/refs/heads/main/datasets/autos.csv')
st.write(auto)
st.scatter_chart(data=auto,x='age',y='premium' )

Writing scatter.py


In [2]:
!streamlit run /content/bar.py t &>/content/logs.txt &

In [3]:
!curl https://loca.lt/mytunnelpassword

34.16.182.210

In [4]:
!npx localtunnel --port 8501 &

[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K[1mnpm[22m [33mwarn[39m [94mexec[39m The following package was not found and will be installed: localtunnel@2.0.2
[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0Kyour url is: https://open-cameras-bet.loca.lt


# Histogram


A histogram is a type of bar chart used to represent the distribution of numerical data by grouping data points into bins. In the example auto['Y2023'].plot(kind='hist', bins=[0,500,800,1000,5000]), it shows how the insurance premiums (Y2023) are distributed across different ranges (or bins) such as 0-500, 500-800, etc. This visualization helps to understand the frequency of premiums in each range, allowing for quick insights into data distribution.

In [5]:
%%writefile hist.py
import streamlit as st
import pandas as pd
auto = pd.read_csv("https://raw.githubusercontent.com/lcbjrrr/quantai/refs/heads/main/datasets/autoinsurance.csv")
st.write(auto)
st.pyplot(auto['Y2023'].plot(kind='hist',bins=[0,500,800,1000,5000]).figure)

Writing hist.py


In [6]:
!streamlit run /content/bar.py t &>/content/logs.txt &

In [7]:
!curl https://loca.lt/mytunnelpassword

34.16.182.210

In [8]:
!npx localtunnel --port 8501 &

[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0Kyour url is: https://afraid-ads-fix.loca.lt


# Boxplot

A boxplot is a graphical representation of the distribution of numerical data through quartiles. It shows the minimum, first quartile (Q1), median, third quartile (Q3), and maximum, highlighting the spread and skewness of the data. Outliers are typically marked as individual points outside the "whiskers." In the example auto['Y2023'].plot(kind='box'), the boxplot visualizes the distribution of the Y2023 insurance premiums, helping to quickly assess the data's range, central tendency, and any potential outliers.

In [1]:
%%writefile box.py
import streamlit as st
import pandas as pd
auto = pd.read_csv("https://raw.githubusercontent.com/lcbjrrr/quantai/refs/heads/main/datasets/autoinsurance.csv")
st.write(auto['Y2023'].describe())
st.pyplot(auto['Y2023'].plot(kind='box').figure)

Writing box.py


In [2]:
!streamlit run /content/hist.py t &>/content/logs.txt &

In [3]:
!curl https://loca.lt/mytunnelpassword

35.245.38.91

In [None]:
!npx localtunnel --port 8501 &

[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K[1mnpm[22m [33mwarn[39m [94mexec[39m The following package was not found and will be installed: localtunnel@2.0.2
[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0Kyour url is: https://mean-jeans-worry.loca.lt


# Map

To work with map data in Streamlit, the st.map() function can be used to plot geographic data points on a map. The df dataframe contains latitude (lat) and longitude (lon) columns for geographic coordinates. The size parameter adjusts the size of the markers on the map. By calling st.map(df, latitude="lat", longitude="lon", size="2300"), the geographic data will be visualized on an interactive map, making it easier to explore location-based trends.








In [1]:
%%writefile map.py
import streamlit as st
import pandas as pd
auto = pd.read_csv("https://raw.githubusercontent.com/lcbjrrr/quantai/refs/heads/main/datasets/autoinsurance.csv")
#st.write(df)
st.map(df, latitude="lat", longitude="lon", size="2300")

Writing map.py


In [2]:
!streamlit run /content/map.py t &>/content/logs.txt &

In [3]:
!curl https://loca.lt/mytunnelpassword

34.106.191.213

In [4]:
!npx localtunnel --port 8501 &

[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K[1mnpm[22m [33mwarn[39m [94mexec[39m The following package was not found and will be installed: localtunnel@2.0.2
[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0Kyour url is: https://plain-vans-spend.loca.lt


# Activity: Streamlit

***dataset***: This dataset contains information about travel insurance policies or travel-related services sold by different agencies. Here's a breakdown of the columns and what they likely represent:

**data**: `https://raw.githubusercontent.com/lcbjrrr/quantai/refs/heads/main/activities/travel%20-%20insurance.csv`

*   **Agency:** The code or name of the agency selling the product (e.g., CBH, JZI, KML, C2B).
*   **Agency Type:** The type of agency, categorized as either "Travel Agency" or "Airlines." This indicates whether the seller is a dedicated travel agency or an airline selling insurance as an ancillary product.
*   **Distribution Channel:** How the product was sold: "Offline" (e.g., in person, over the phone) or "Online" (e.g., through a website or app).
*   **Product Name:** The name of the insurance plan or travel product (e.g., Comprehensive Plan, Value Plan, Basic Plan, Premier Plan, Bronze Plan, Silver Plan). This suggests different tiers or types of coverage.
*   **Claim:** Indicates whether a claim was filed for the policy: "Yes" or "No." This is a crucial indicator of whether the insurance was used.
*   **Duration:** The duration of the policy or trip, likely in days.
*   **Destination:** The destination of the trip.
*   **Net Sales:** The net sales amount, which can be positive or negative. Negative values might indicate refunds or cancellations.
*   **Commision (in value):** The commission earned by the agency for selling the product.
*   **Gender:** The gender of the customer (M or F).
*   **Age:** The age of the customer.
