# Python for Data Analysts 
### Analyse and visualise data with Snowpark for Python and Streamlit


Here is an introduction to some handy snowpark techniques and how you can use it to help with visualisation

### What is Snowpark

![df](https://storage.googleapis.com/gae-wp-itsol-prd.appspot.com/1/Snowflake-Blog_vol.014_1.png)

Snowpark is a series of APIs which allow you to transform, model and perform operations on your data using a Language of choice.  Natively, Snowpark supports Python, Scala, Java and SQL.  However, if you would like to leverage an additional language, you can leverage Snowpark Container Services.  This tutorial will get you familariar with Snowpark dataframes for python.

https://docs.snowflake.com/en/developer-guide/snowpark/index

#### Importing the relevant Libraries

We will import the python libraries.  If you wish to use a note book outside of snowflake workbooks, you can simply **pip install** them using the following link:

https://pypi.org/project/snowflake-snowpark-python/

The main difference between using snowpark outside of the snowflake ui (Snowsight) and using within Snowsight is how you connect to the data. If you leverage Snowflake data in your own interface such as jupyter and want to use snowpark you will need to set up a session which requires credentials to be passed. How this is achieved is documented in the pypi documentation.  In this session, will be using the connection which is dynamically provided by the session.  The session has already been authenticated when you logged into snowflake so will pass on any of the relevent security privileges and constraints.  The other difference is that in order to render the results, the notebook needs to leverage streamlit which is built into snowflake.  Whether you choose to utilise Snowpark with the provided UI or your own, Snowpark still provides the same governence and powerful compute provided by Snowflake, which allows for fast secure analytics.

So without going any further, run the next cell to load the required libaries

In [None]:
# Import streamlit
import streamlit as st

# We can also use Snowpark for our analyses!
from snowflake.snowpark.context import get_active_session
session = get_active_session()
st.markdown('#### Hello, welcome to snowpark')
st.markdown('You will see that we have imported **get_active_session** from snowpark and have also imported streamlit to render any results')

#### Writing SQL
If you want to write sql in snowpark you can do this by using **session.sql**.   You can create any query accross any schema / database / view in snowflake using this syntax.  Below, I have created a query and saved it as a dataframe.  I then called back the dataframe to retrieve the results

In [None]:
SELECT * FROM DEFAULT_SCHEMA.SYNTHETIC_POPULATION LIMIT 5

In [None]:
dataframe = session.sql('select * from DEFAULT_SCHEMA.SYNTHETIC_POPULATION')

dataframe.limit(5)

Any field in the dataframe, you can view the distinct values.  Below you will see the distinct values for marital status

In [None]:
dataframe.select('MARITAL_STATUS').distinct()

### Exercise
Create a list of distinct values for the field **OCCUPATION**

#### Export out to Pandas
Any dataframe you make in snowpark can be exported out as a pandas dataframe. You simply add **to_pandas()** at the end of the dataframe and you very quickly get a pandas dataframe


**What is Pandas??**
Pandas is a Python library used for working with data sets.  It has functions for analyzing, cleaning, exploring, and manipulating data.  Below you will see the snowpark dataframe converted to Pandas, then you will see the data types associated with each column in the dataset.  

From our perspective, pandas is useful for visualisation puroposes.  Many of the python visualisation libraries work with pandas.  The down side of pandas is its not fast in terms of processing.  For Large scale multi threaded data engineering, work with Snowpark dataframes, as it will leverage parellel processing and all the standard functionality that Snowflake offers.

In [None]:
dataframepd = dataframe.limit(100).to_pandas()

st.markdown('##### This is a dataframe converted to pandas')
st.write(dataframepd.head())
st.markdown('##### These are the pandas datatypes')
st.write(dataframepd.dtypes)

Likewise, you might want to load a pandas dataframe and load it as a snowpark dataframe.  in reality this could have been what was once a csv file which you loaded - or simply to re-import data that was originated from Snowflake back into a snowpark dataframe.

In [None]:
dataframe_2 = session.createDataFrame(dataframepd)
dataframe_2

Now we its back as a snowpark dataframe, lets inspect the columns along with their data types.

In [None]:
dataframe.print_schema()

At the beginning, we created a dataframe using SQL.   We can also create a dataframe by simply specifying the table name. **Note**  You will see that the table holds over 64 million records.  We worked this out easily with the **count()** feature.  As we know there are a lot of rows, we have limited the result set by adding **limit(5)** to the dataframe

In [None]:
dataframe = session.table('DEFAULT_SCHEMA.SYNTHETIC_POPULATION')

st.write(dataframe.count())
st.write(dataframe.limit(5))


You might have noticed that there is a column called 'MULTIPLE_MOBILITIES' - this is meant to say 'MULTIPLE_MORBIDITIES'.  Its simple to rename any column - look at how we do this below:

In [None]:
dataframe = dataframe.with_column_renamed('MULTIPLE_MOBILITIES','MULTIPLE_MORBIDITIES')
dataframe.limit(5)

It is easy to select the columns you want in a dataframe by using **select** after the dataframe name. as we have over 30million columns, we will limit the results to the first 10 rows.

In [None]:
dataframe.select('FIRST_NAME','LAST_NAME','PRACTICE_CODE','PRACTICE_NAME').limit(10)

Use the select construct to select some columns of your choice.  In python, I have added the dataframe.columns command to give you an idea of the columns you may want to select.

In [None]:
#here is the command to reveal the columns
dataframe.columns

### below, write the correct python to select 5 columns from the dataframe, and limit the rows to 10

#### Using snowflake functions in Python.
Any function whether built in or custom functions can be used in python dataframes. Its easier to import them so they are accessible by using the following:


In [None]:
from snowflake.snowpark.functions import *

#### Let's add a new column for age.  This can be calculated from the data of birth. 

Firstly, you will use the **with_column** construct to create a new column, within this the new column is calculated from an **expression**.  This expression uses the built in **datadiff** function.  All SQL functions available in Snowflake SQL are also available in Snowpark for Python.

Below we are using the datedif function to calculate the citizens age in years.  after this i quickly displayed the results by doing the following:
- selecting the first 5 rows and selecting a sub set of columns with include age


In [None]:
dataframe = dataframe.with_column('AGE',datediff('year',col('DATE_OF_BIRTH'),current_date()))

st.dataframe(dataframe.select('NHS_NUMBER','FIRST_NAME','LAST_NAME','AGE').limit(5))

Now we have selected the right columns and included an additional column that includes the age, we will group the ages together with a groubin b by age, then sort the grouped dataframe by age.

- grouping the ages with a count and sum to view total morbidities and the count of all people that have that age.
- in the view by age, we have sorted by age with the 'sort' function.

In [None]:
dataframe.group_by('AGE').agg(sum('MULTIPLE_MORBIDITIES').alias('"Total Morbidities"'),count('*').alias('"Total Citizens"')).sort('AGE')

Lets filter some of the data from the dataset. A new dataframe is created to ov  I have decided not to have under 5 yr olds in my analysis - therefore I will filter them out.  Next, the result is saved to a new table and finally we will view a sample of the data under 5.  Notice we are using select again to preview only the columns we want.


In [None]:
dataframe_u5 = dataframe.filter(col('AGE')>=5)
dataframe_u5.write.mode('overwrite').save_as_table('DEFAULT_SCHEMA."Population Health Synthetic Data_over_5"')
dataframe.filter(col('AGE')<5).limit(10).select('NHS_NUMBER','FIRST_NAME','LAST_NAME','AGE')

Snowpark dataframes make it easy to create lots of components for different purposes.  You would have previously used the distinct function to view all occupation types.  You could use this to populate a select box for dynamic filtering purposes.

In [None]:
dataframe = session.table('DEFAULT_SCHEMA."Population Health Synthetic Data_over_5"')
occupations = dataframe.select('OCCUPATION').distinct()
st.selectbox('Choose Your Occupation:',occupations)

Lets see this new dataframe in action.  The select box is given a name called filter.  This name is used as a variable to filter the occupation in the dataset.

In [None]:
filter = st.selectbox('Choose Occupation:',occupations)

filtered_df = dataframe.filter(col('OCCUPATION')==filter)
filtered_df.limit(5)

#### Now we have selected a few results, lets group the filtered dataset by marital status. 

In [None]:
filtered_df.group_by('MARITAL_STATUS').agg(sum('"MULTIPLE_MORBIDITIES"'))

We will use the aggregation construct to summarise more figures based on the marital status grouping.

In [None]:
filtered_df.group_by('MARITAL_STATUS').agg(sum('MULTIPLE_MORBIDITIES'),
                                           sum('CANCER'),
                                            sum('DIABETES'),
                                            sum('COPD'),
                                            sum('ASTHMA'))

Next, let's rename the columns to make them clearer.  This is by using the 'alias' feature

In [None]:
dataframe.group_by('MARITAL_STATUS').agg(sum('MULTIPLE_MORBIDITIES').alias('"Multiple Morbidities"'),
                                           sum('CANCER').alias('"Cancer"'),
                                            sum('DIABETES').alias('"Diabetes"'),
                                            sum('COPD').alias('"COPD"'),
                                            sum('ASTHMA').alias('"Asthma"'))

It is easy to effectively view comparisons of results side by side.  But what if you would like to do it using data accross multiple rows.  This is where the **pivot** function can be used.

In [None]:
pivot_selection = dataframe.select('ICB22NM','PRACTICE_NAME','MARITAL_STATUS','CANCER')

fields = dataframe.select('MARITAL_STATUS').distinct()

pivotted = pivot_selection.pivot(col('MARITAL_STATUS'), fields).sum('"CANCER"')

pivotted

#### Mass cleaning of column names
As Snowpark is programmable you can leverage python functions for the cleaning of multiple column names.  In this case we are converting to upper case.  We have imported the Dataframe Object in order to make changes to the structure of the dataframe.

Below is using a **python function** make changes to the column name and remove undesired characters.  All python functions start are defined as **def**. 

In [None]:
from snowflake.snowpark import DataFrame

def clean_col_names(df: DataFrame) -> DataFrame:
    '''
        removes annoying single & double quotes in the column names
        also makes all characters in the column names lowercase
    '''
    df_res = df
    for col in df.columns:
        df_res = df_res.with_column_renamed(col, col.upper().replace('\'', '').replace('"', ''))
        
    return df_res

we will run the function that has just been defined and call the resulting dataframe **pivotted**.

In [None]:
pivotted = clean_col_names(pivotted)
pivotted.limit(100)

#### Putting data in a chart
If you want to move the data into a chart you need to convert the result set to pandas first by using the **to_pandas()** feature.  Only send what you need to visualise after doing filtering/transformations in snowpark.

In [None]:
import altair as alt



c = (
   alt.Chart(pivotted.to_pandas())
   .mark_circle()
   .encode(x=" MARRIED OR IN A SAME SEX CIVIL PARTNERSHIP",
           y="SINGLE", 
           size="SEPERATED (BUT STILL LEGALLY MARRIED OR STILL LEGALLY IN A SAME-SEX CIVIL PARTNERSHIP)", 
           color="<16 YEARS OLD THEREFORE INELIGIBLE TO MARRY", tooltip=["PRACTICE_NAME"])
)

st.altair_chart(c, use_container_width=True)


Below we are providing interactivity on both the ICB and the type of morbidity by parameterising the snowpark dataframe during the transformation process.  The last step is to convert to pandas.

In [None]:
distinct_icb = dataframe.select('ICB22NM').distinct()

s_icb = st.selectbox('choose icb:',distinct_icb)

s_morbidities = st.selectbox('select morbidity',['ASTHMA','COPD','DIABETES','CANCER','MULTIPLE_MORBIDITIES'])
pivot_selection = dataframe.select('ICB22NM','PRACTICE_NAME','MARITAL_STATUS',f'{s_morbidities}')
#pivot_selection = filtered_df.select('ICB22NM','PRACTICE_NAME','MARITAL_STATUS',)

pivotted = pivot_selection.pivot(col('MARITAL_STATUS'), fields).sum(f'"{s_morbidities}"')
pivotted = clean_col_names(pivotted)


pivottedpd = pivotted.filter(col('ICB22NM')==s_icb).to_pandas()


import altair as alt

c = (
   alt.Chart(pivottedpd)
   .mark_circle()
   .encode(x=" MARRIED OR IN A SAME SEX CIVIL PARTNERSHIP",
           y="SINGLE", 
           size="SEPERATED (BUT STILL LEGALLY MARRIED OR STILL LEGALLY IN A SAME-SEX CIVIL PARTNERSHIP)", 
           color="<16 YEARS OLD THEREFORE INELIGIBLE TO MARRY", tooltip=["PRACTICE_NAME"])
)

st.markdown('Patients by Marital Status')
st.altair_chart(c, use_container_width=True)

#### More on Filtering
Below we can see data which is filtering everyone betweeen the age of 11 and 16 years of age.  This is using the **between** function.

In [None]:
st.markdown('Filter Results based on the age being between 11 and 16')
st.write(dataframe.filter(col('Age').between(11,16)).sample(0.1).limit(1000))
st.markdown('NB - we are limiting on 1000 records based on a 10% random sample')

Here we are filtering on practices that begin with Y. For this, its easy to use the **.like** feature where the **%** is used as a wild card.

In [None]:
st.markdown('Filter Results based on Practice Code starting with Y')
st.write(dataframe.filter(col('PRACTICE_CODE').like('Y%')).sample(0.1).limit(1000))
st.markdown('NB - we are limiting on 1000 records based on a 10% random sample')

Here, we are filtering using multple filters - looking at the healthcode which is less than or equal to 2 and also all males.

In [None]:
st.markdown('Filter Results based on Health Code being less than 2 and Sex being Male')
st.write(dataframe.filter((col('GENERAL_HEALTH_CODE') <=2)& (col('SEX') =='Male')).sample(0.1).limit(1000))
st.markdown('NB - we are limiting on 1000 records based on a 10% random sample')

#### Putting Data into Buckets.  There are many ways to group data together into buckets in order to dp specific types of analysis.  Below are 3 examples.

Bucketing by timeslices - so bacically grouping the data of birth into 3 month intervals.  Snowflake has powerful time series analysis and this can be leveraged though SQL as well as snowpark.

In [None]:
time_slice = dataframe.select('DATE_OF_BIRTH',call_function('time_slice',col('DATE_OF_BIRTH'),3,'MONTH','END').alias('month_interval'),'MULTIPLE_MORBIDITIES')
time_slice = time_slice.group_by('month_interval').agg(sum('MULTIPLE_MORBIDITIES').alias('MULTIPLE_MORBIDITIES'))
st.markdown('Date of Birth vs multiple morbidities')
st.line_chart(time_slice,y='MULTIPLE_MORBIDITIES',x='MONTH_INTERVAL')


The Second Bucket is a width Bucket.  This time we will bucket on the actual age and will build a bar chart on this.    **ST_BAR_CHART** is used.

In [None]:
age_bucket = dataframe.select('MULTIPLE_MORBIDITIES',
                                 call_function('WIDTH_BUCKET',
                                               col('AGE'),0,110,6).alias('AGE_BUCKET'))\
.group_by('AGE_BUCKET').agg(sum('MULTIPLE_MORBIDITIES').alias('MULTIPLE_MORBIDITIES'))

st.bar_chart(age_bucket.sort('AGE_BUCKET'),x='AGE_BUCKET',y='MULTIPLE_MORBIDITIES')


The Third Bucket is a H3 bucket.  Grouping all the places to a H3 index.  This will create hexagons accross england.  The function we are using for this grouping is **H3_LATLNG_TO_CELL_STRING**. This will represent a hexagon shaped 'bucket' at a given resolution (size).  Visualising this sort of data is best performed using a map.  **Pydeck** is a useful python library for visualising location data.

In [None]:
import pydeck as pdk
H3 = dataframe.select('MULTIPLE_MORBIDITIES','LAT','LON',
                                 call_function('H3_LATLNG_TO_CELL_STRING',
                                               col('LAT'),col('LON'),5).alias('H3'))
H3 = H3.group_by('H3').agg(sum('MULTIPLE_MORBIDITIES').alias('MULTIPLE_MORBIDITIES'), avg('LAT').alias('LAT'),
                          avg('LON').alias('LON'))

TOTALS = H3.agg(avg('LAT').alias('LAT'),
                         avg('LON').alias('LON'),
                         max('MULTIPLE_MORBIDITIES').alias('MULTIPLE_MORBIDITIES')).to_pandas()

tot_morbidities = float(TOTALS.MULTIPLE_MORBIDITIES.iloc[0])


H3 = H3.with_column('M_PERC', div0(col('MULTIPLE_MORBIDITIES'),lit(tot_morbidities)))
h3pd = H3.to_pandas()





morbidities = pdk.Layer(
        "H3HexagonLayer",
        h3pd,
        pickable=True,
        stroked=True,
        filled=True,
        extruded=False,
        get_hexagon="H3",
        get_fill_color=[f"255 * M_PERC","50","140 - 140 * M_PERC"],
        line_width_min_pixels=2,
        opacity=0.4)


st.pydeck_chart(pdk.Deck(
    map_style=None,
    initial_view_state=pdk.ViewState(
        latitude=TOTALS.LAT.iloc[0],
        longitude=TOTALS.LON.iloc[0],
        zoom=5,
        height=400
        ),
    
layers= [morbidities], tooltip = {'text':"Morbidities: {MULTIPLE_MORBIDITIES}"}

))


#### Get another Dataset from the gov website, create a dataframe and join to our existing dataset.
Before we get anything from the internet we will need to enable an existing **External Access Integration**. 

- Click on the 3 dots to on the top right hand corner of the notebook and navigate to **notebook settings**
- Under external access, enable **UK_GOV_PUBLISHING_SERVICE**
- Also, enable **GEOPORTAL** external integration which is used in a later step.

The next step downloads the CSV using the python CSV and requests packages and resultset is then loaded into a snowpark dataframe.  you will note that i am using **cache_result**.  this effectively inserts all the data in a temporary table which allows us to call it back quickly.

Just to highlight, we are using a standard python **requests** package to call data externally. We then use the **csv** library to read the csv file, delimeter it as a comma, store it as a variable called my_list.  Finally, we will import that list into a new snowpark dataframe.


In [None]:
import streamlit as st
# We can also use Snowpark for our analyses!

import csv
import requests
from snowflake.snowpark.functions import *
from snowflake.snowpark.types import *
session = get_active_session()

CSV_URL = 'https://assets.publishing.service.gov.uk/media/5dc407b440f0b6379a7acc8d/File_7_-_All_IoD2019_Scores__Ranks__Deciles_and_Population_Denominators_3.csv'


with requests.Session() as s:
    download = s.get(CSV_URL)

    decoded_content = download.content.decode('utf-8')
    cr = csv.DictReader(decoded_content.splitlines(), delimiter=',')
    my_list = list(cr)


deprivation = session.create_dataframe(my_list).cache_result()


Lets view the new dataframe as well as it's columns

In [None]:
st.markdown('A preview of the data')
st.write(deprivation.limit(5))

st.markdown('All the columns')
st.write(deprivation.columns)

We have selected columns previously, the same can be applied to this newly created dataframe.

In [None]:
s_deprivation = deprivation.select("\"LSOA code (2011)\"","\"LSOA name (2011)\"",
                                   '"Health Deprivation and Disability Score"',
                                   '"Living Environment Score"',
                                   '"Total population: mid 2015 (excluding prisoners)"')
st.write(s_deprivation.limit(5))

In [None]:
st.write('Before we can sensibly join the datasets together, we must group the original dataset so its at the same level of aggregation as the deprivation dataset')
dataframe = session.table('DEFAULT_DATABASE.DEFAULT_SCHEMA."Population Health Synthetic Data_over_5"')
data_grouped = dataframe.group_by('LSOA_CODE','ICB22NM').agg(sum('MULTIPLE_MORBIDITIES').alias('MULTIPLE_MORBIDITIES'),sum('ASTHMA').alias('ASTHMA'))
data_grouped

In [None]:
st.markdown('Next, we join the two dataframes together.')

all_data = data_grouped.join(s_deprivation,data_grouped['LSOA_CODE']==s_deprivation["\"LSOA code (2011)\""]).drop("\"LSOA code (2011)\"")

all_data.write.mode('overwrite').save_as_table('DEFAULT_SCHEMA."Morbidities and Deprivation by LSOA"')

all_data

#### Add Polygons to represent the LSOA boundaries and join to dataset

Same as before, we need to get the polygons externally so we will need to create a network integration first.  Once this is in place, you simply allow the notebook to use it.  You need to ensure that the 'geoportal' integration is enabled for the notebook.

This integration leveraging the geoservice to get the data we need from the ons geoportal
https://geoportal.statistics.gov.uk/datasets/ons::lower-layer-super-output-areas-december-2011-boundaries-ew-bsc-v4/about

Geopandas is used to format the data into binary format, then pushed it as a snowpark dataframe.  I then converted the binary polygon into a snowflake geography column and then only selected the LSOA code and the polygon

In [None]:
import streamlit as st
# We can also use Snowpark for our analyses!

import csv
import requests
from snowflake.snowpark.functions import *
from snowflake.snowpark.types import *
session = get_active_session()

import requests
import geopandas as gpd

# URL of the ArcGIS web service (replace with your actual URL)
url='https://services1.arcgis.com/ESMARspQHYMw9BZ9/arcgis/rest/services/LSOA_2011_Boundaries_Super_Generalised_Clipped_BSC_EW_V4/FeatureServer/0/query'

params = {
    "where": "1=1",  # Example query to fetch all records
    "outFields": "*",  # Fetch all fields
    "f": "geojson"  # Specify the format as GeoJSON
}

# Make the request to the ArcGIS web service
response = requests.get(url, params=params)

# Check if the request was successful
if response.status_code == 200:
    # Load the GeoJSON data into a GeoDataFrame
    gdf = gpd.read_file(response.text)
    
    # Display the first few rows of the GeoDataFrame
    polygons = session.create_dataframe(gdf.to_wkb()).cache_result()
    polygons = polygons.with_column('GEOM',to_geography(col('"geometry"'))).select('LSOA11CD','GEOM')
else:
    print("Failed to fetch data:", response.status_code)

polygons


#### Creating a function in Python

You can convert this python to a function which you can use in SQL too.  Converting to a function allows the logic to be reused.  in this case we are reusing the functionality to create a connector to the geoportal api.  This function is utilising python but you can create functions in java or sql too.

In [None]:
CREATE OR REPLACE FUNCTION DEFAULT_SCHEMA.GET_LSOA_POLYGON()
RETURNS TABLE (LSOA11CD STRING, GEOM string)
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
HANDLER = 'arcgisservicedata'
PACKAGES = ('requests', 'geopandas', 'shapely')
EXTERNAL_ACCESS_INTEGRATIONS = (GEOPORTAL)
AS
$$
import requests
import geopandas as gpd
from shapely import wkb
import pandas as pd

class arcgisservicedata:
    def process(self):
        url = 'https://services1.arcgis.com/ESMARspQHYMw9BZ9/arcgis/rest/services/LSOA_2011_Boundaries_Super_Generalised_Clipped_BSC_EW_V4/FeatureServer/0/query'
                  
    
        # Parameters for the request
        params = {
        "where": "1=1",        # Example query to fetch all records
        "outFields": "*",      # Fetch all fields
        "f": "geojson"         # Specify the format as GeoJSON
        }

        # Fetch data from the ArcGIS service
        response = requests.get(url, params=params)

        # Check for a successful response
        if response.status_code == 200:
            # Load the GeoJSON data into a GeoDataFrame
            gdf = gpd.read_file(response.text)

        # Convert the geometry to WKB format and prepare output data
            gdf['GEOM'] = gdf['geometry'].apply(lambda x: x.wkb_hex)
        
        # Prepare columns to output
            output = pd.DataFrame({
                'LSOA11CD': gdf['LSOA11CD'],
                'GEOM': gdf['GEOM']
            })
        
            # Yield each row as a tuple
            for _, row in output.iterrows():
                yield (row['LSOA11CD'], row['GEOM'])

        else:
            raise ValueError(f"Failed to fetch data: {response.status_code}")
$$;


#### Using the Created function in SQL

In [None]:
select LSOA11CD, to_geography(GEOM) from table(DEFAULT_SCHEMA.GET_LSOA_POLYGON())

We are now using the join feature to join the polygons back onto the dataset.  This dataframe is now ready for converting a persisted table which can then be used in other visualisation tools such as tableau.

In [None]:
all_data = session.table('DEFAULT_SCHEMA."Morbidities and Deprivation by LSOA"')

all_data_polygons = all_data.join(polygons,all_data['LSOA_CODE']==polygons['LSOA11CD']).drop('LSOA11CD')

all_data_polygons.limit(10)

Create a new table from the dataframe

In [None]:
all_data_polygons.write.mode('overwrite').save_as_table('"Health and Deprivation by LSOA"')

In [None]:
st.markdown('Here, we can now call the table into a new dataframe')

session.table('"Health and Deprivation by LSOA"').limit(5)

And now we have seaved as a table, we can view the results in SQL.  You will also note that I leveraged the AS_ASWKT function which is useful for rendering in **matplotlib** and also Power BI's icon map.  https://www.iconmappro.com/

In [None]:
select *,ST_ASWKT(GEOM)WKT from "Health and Deprivation by LSOA"  WHERE ICB22NM like '%London Integrated Care Board';

In [None]:
import geopandas as gpd
import matplotlib.pyplot as plt
import streamlit as st

st.markdown('Here is a comparison of london, comparing with environmental scores with **Asthma** and those with \
**Multiple Morbidities**.  **NB** There are gaps in the underlying data which is useful way of easily spotting quality issues. I believe the gaps might be to do with codeing changes')
# Convert geo DataFrame to GeoPandas DataFrame
geom = gpd.GeoDataFrame(geo.to_df().to_pandas())

# Set geometry from WKT and define CRS
geodframe = geom.set_geometry(gpd.GeoSeries.from_wkt(geom['WKT']))
geodframe.crs = "EPSG:4326"

# Create three columns in Streamlit
col1, col2, col3 = st.columns(3)

# Plot Living Environment Score
with col1:
    fig1, ax1 = plt.subplots(1)
    ax1.axis('off')
    geodframe.plot(column='Living Environment Score', cmap='Reds', alpha=1, ax=ax1)
    ax1.set_title('Living Environment Score')
    st.pyplot(fig1)
    plt.close(fig1)

# Plot ASTHMA with Greens colormap
with col2:
    fig2, ax2 = plt.subplots(1)
    ax2.axis('off')
    geodframe.plot(column='ASTHMA', cmap='Greens', alpha=1, ax=ax2)
    ax2.set_title('Asthma Sufferers')
    st.pyplot(fig2)
    plt.close(fig2)

# Plot ASTHMA with Oranges colormap
with col3:
    fig3, ax3 = plt.subplots(1)
    ax3.axis('off')
    geodframe.plot(column='MULTIPLE_MORBIDITIES', cmap='Oranges', alpha=1, ax=ax3)
    ax3.set_title('Multiple Morbidities')
    st.pyplot(fig3)
    plt.close(fig3)


As you have been going through this notebook, you will have seen how some of the streamlit objects work.  The last cell you loaded introduced you to visualising maps, as well as putting each map in one of 3 columns, and applying markdown as narritive.  All of this work can be used as a starting point to build a streamlit application.  Please now navigate to **Projects > Streamlits > Population_Health** and see what can be achieved.  Streamlit is a python based application framework which is designed for analysts who know python and want to share their data application to the wider community.