<div id="singlestore-header" style="display: flex; background-color: rgba(209, 153, 255, 0.25); padding: 5px;">
    <div id="icon-image" style="width: 90px; height: 90px;">
        <img width="100%" height="100%" src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/header-icons/notes.png" />
    </div>
    <div id="text" style="padding: 5px; margin-left: 10px;">
        <div id="badge" style="display: inline-block; background-color: rgba(0, 0, 0, 0.15); border-radius: 4px; padding: 4px 8px; align-items: center; margin-top: 6px; margin-bottom: -2px; font-size: 80%">SingleStore Notebooks</div>
        <h1 style="font-weight: 500; margin: 8px 0 0 4px;">Getting Started with Notebooks</h1>
    </div>
</div>

<table style="border: 0; border-spacing: 0; width: 100%; background-color: #03010D"><tr>
    <td style="padding: 0; margin: 0; background-color: #03010D; width: 33%; text-align: center"><img src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/singlestore-logo-vertical.png" style="height: 200px;"/></td>
    <td style="padding: 0; margin: 0; width: 66%; background-color: #03010D; text-align: right"><img src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/singlestore-jupyter.png" style="height: 250px"/></td>
</tr></table>

## What you will learn in this notebook:

1. Load a CSV file from our Github Repo [Python]
2. Ingest that file into a SingleStoreDB without defining the schema [Python]
3. Interact natively with the database using SQL [SQL]
4. Convert results to a DataFrame and visualize results with Plotly [Python]

## Questions?

Reach out to us through our [forum](https://www.singlestore.com/forum).

## Enhance your notebooks with visualizations

## 1. Import libraries for reading data into a DataFrame

Our data set contains geographic data, so we also install [Shapely](https://shapely.readthedocs.io/en/stable/)
to store that data in Shapely geometry objects.

In [1]:
!pip3 install shapely --quiet

import pandas as pd
import shapely.wkt

## 2. Load a csv file hosted in Github using Python

Notice that we are using the `dtype=`, `parse_dates=`, and `converters=` options of the `read_csv` method to
convert specific columns into various data types, including geographic data in the `business_location` column.
See the [`read_csv`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) documentation
for more information.

In [2]:
url = 'https://raw.githubusercontent.com/singlestore-labs/singlestoredb-samples/main/' + \
      'Sample%20datasets/csv/Restaurant_Scores_LIVES_Standard.csv'

In [3]:
def str_to_shapely(x: str) -> shapely.geometry.Point | None:
    """Convert a WKT string to a shapely object while handling NULLs."""
    return shapely.wkt.loads(x) if x else None


# Read URL directly using pd.read_csv
df = pd.read_csv(url, index_col=0,
                 # Use parse_date=, dtype=, and converters= to specify explicit data types
                 parse_dates=['inspection_date'],
                 date_format='%m/%d/%Y %H:%M:%S %p',
                 dtype=dict(business_id=int, business_phone_number=str, business_postal_code=str, inspection_score=float),
                 converters=dict(business_location=str_to_shapely))
df

Display the data types in the resulting DataFrame. Note that any objects that pandas does not support natively (e.g., strings, blobs, shapely geometries, etc.) show up as `object`.

In [4]:
df.dtypes

## 3. Ingest a DataFrame in a SingleStoreDB table

1. Create the database
2. Import the library to connect to the database
3. Create the connection to the library
4. Ingest the dataframe to the newly created database

Set the database name in a variable. It will be used in subsequent queries.

In [5]:
database_name = 'getting_started_notebook'

Here we are using the `database_name` variable in a `%%sql` cell. The syntax for including Python variables
is to surround the variable name with `{{ ... }}`.

In [6]:
%%sql
DROP DATABASE IF EXISTS {{database_name}};
CREATE DATABASE {{database_name}};

<div class="alert alert-block alert-warning">    <b class="fa fa-solid fa-exclamation-circle"></b>    <div>        <p><b>Action Required</b></p>        <p>Make sure to select the <tt>getting_started_notebook</tt> database from the drop-down menu at the top of this notebook.        It updates the <tt>connection_url</tt> which is used by the <tt>%%sql</tt> magic command and SQLAlchemy to make connections to the selected database.</p>    </div></div>

We can use SQLAlchemy and pandas to upload a DataFrame. Note that if the table does not exist, the data types will
be inferred from the data. This may not result in the exact types that you desire. You can define the table in
the database before uploading to get the exact types you want.

If you get an error about the database not being selected, that simply means that your `connection_url` does not
contain a specific database to connect to. You can use the drop-down menu at the top of this notebook (immediately
under the title) to select a database to work with. Changing the selection in the drop-down menu also updates
the `connection_url` variable.

In [7]:
import sqlalchemy as sa

# Create a SQLAlchemy engine and connect
db_connection = sa.create_engine(connection_url).connect()

The SingleStoreDB Python package also adds a convenience function for SQLAlchemy connections
without using the `connection_url`. It automatically gets the connection information from
the `SINGLESTOREDB_URL` environment variable.

In [8]:
import singlestoredb as s2

# Create a SQLAlchemy engine and connect, without having to specify the connection URL
db_connection = s2.create_engine().connect()

# Upload the DataFrame
df.to_sql('sf_restaurant_scores', con=db_connection, if_exists='append', chunksize=1000)

## 4. Interact natively with the database using SQL

1. Read the top 10 rows from the table
2. Alter the table to get the date in a date format, not string
3. Read the number of restaurant inspections over the time in San Francisco

In [9]:
%%sql
SELECT * FROM {{database_name}}.sf_restaurant_scores LIMIT 10;

In the code block below, we use the `result1 <<` syntax on the `%%sql` line to store the result of the SQL
operation into a variable which can be used later. As with other Jupyter notebooks, you can always get the value
of the last executed cell in the `_` (underscore) variable, but setting a specifc variable name to use is generally
a safer way to retrieve results.

In [10]:
%%sql result1 <<
SELECT
    DATE_TRUNC('month', inspection_date) AS month,
    COUNT(*) AS count_inspection
FROM
    {{database_name}}.sf_restaurant_scores
GROUP BY
    MONTH
ORDER BY
    MONTH DESC;

The output of a `%%sql` cell is a `ResultSet` which contains methods for converting to various other data types (e.g., `csv`, `dicts`, `DataFrame`, `PolarsDataFrame`). It is also possible to convert to a DataFrame by passing a `ResultSet` object to the DataFrame
constructor as we'll see below.

In [11]:
type(result1)

## 5. Visualize with Plotly

We are using [Plotly](https://plotly.com) to visualize the data in `result1`. The first parameter of the
`bar` function requires a DataFrame, so we'll convert `result1` to a DataFrame before calling `bar`.

In [12]:
result1_df = pd.DataFrame(result1)
result1_df[:5]

In [13]:
import plotly.express as px

px.bar(result1_df, x='month', y='count_inspection', title='Inspections by Month')

## 6. Cleanup database

In [14]:
%%sql
DROP DATABASE {{database_name}};

<div id="singlestore-footer" style="background-color: rgba(194, 193, 199, 0.25); height:2px; margin-bottom:10px"></div>
<div><img src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/singlestore-logo-grey.png" style="padding: 0px; margin: 0px; height: 24px"/></div>