From 8f40f32a47d6a2ff80170e0eee7513cd84e18ab1 Mon Sep 17 00:00:00 2001 From: Joao Fitas Date: Fri, 25 Jul 2025 14:26:00 +0100 Subject: [PATCH] remove SingleStore now 2024 notebook --- notebooks/singlestore-now-2024/meta.toml | 11 - notebooks/singlestore-now-2024/notebook.ipynb | 441 ------------------ 2 files changed, 452 deletions(-) delete mode 100644 notebooks/singlestore-now-2024/meta.toml delete mode 100644 notebooks/singlestore-now-2024/notebook.ipynb diff --git a/notebooks/singlestore-now-2024/meta.toml b/notebooks/singlestore-now-2024/meta.toml deleted file mode 100644 index 944fe0f..0000000 --- a/notebooks/singlestore-now-2024/meta.toml +++ /dev/null @@ -1,11 +0,0 @@ -[meta] -authors=["chetan-thote"] -title = "SingleStore Now 2024 Raffle" -description = """ - "Explore the power of SingleStore in this interactive notebook by creating an account, loading data, and running queries for a chance to win the SingleStore Now 2024 Raffle!" """ -icon = "radar" -difficulty="intermediate" -tags = ["mongo", "embeddings", "vector", "genai", "kai", "starter"] -lesson_areas=["Kai", "AI"] -destinations = ["spaces"] -minimum_tier="free-shared" diff --git a/notebooks/singlestore-now-2024/notebook.ipynb b/notebooks/singlestore-now-2024/notebook.ipynb deleted file mode 100644 index 05f1261..0000000 --- a/notebooks/singlestore-now-2024/notebook.ipynb +++ /dev/null @@ -1,441 +0,0 @@ -{ - "cells": [ - { - "id": "fc7e7b35", - "cell_type": "markdown", - "metadata": {}, - "source": [ - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - "
SingleStore Notebooks
\n", - "

SingleStore Now 2024 Raffle

\n", - "
\n", - "
" - ] - }, - { - "id": "50e92616", - "cell_type": "markdown", - "metadata": {}, - "source": [ - "
\n", - " \n", - "
\n", - "

Note

\n", - "

This notebook can be run on a Free Starter Workspace. To create a Free Starter Workspace navigate to Start using the left nav. You can also use your existing Standard or Premium workspace with this Notebook.

\n", - "
\n", - "
" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "\n", - "The data set used in this competition/demo contains some E-commerce data revolving around customers and products that they have purchased. In this notebook, we will run a few queries using SingleStore Kai which will allow us to migrate MongoDB data and run MongoDB queries directly through SingleStore. To create your entry for the raffle, please open and complete the following form: https://forms.gle/n8KjTpJgPL29wFHV9\n", - "\n", - "If you have any issues while completing the form, please reach out to a SingleStore team member at the event." - ], - "id": "9019c195" - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Install libraries and import modules\n", - "\n", - "First, we will need to import the necessary dependencies into our notebook environment. This includes some python libraries needed to run our queries." - ], - "id": "050d60ed" - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "!pip install pymongo pandas ipywidgets --quiet" - ], - "id": "f34495f0" - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To ensure that we have a database we can use, we will then make sure that a database exists. If it doesn't we will have the notebook create one for us." - ], - "id": "2bdd3998" - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "shared_tier_check = %sql show variables like 'is_shared_tier'\n", - "\n", - "if shared_tier_check and shared_tier_check[0][1] == 'ON':\n", - " current_database = %sql SELECT DATABASE() as CurrentDatabase\n", - " database_to_use = current_database[0][0]\n", - "else:\n", - " database_to_use = \"new_transactions\"\n", - " %sql CREATE DATABASE {{database_to_use}}" - ], - "id": "1b1fb18e" - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, let's run the code that will actually import the needed dependencies, including `pymongo`, that will be used to connect to SingleStore and our Mongo instance where the initial data is stored." - ], - "id": "7a940e1f" - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "import time\n", - "import numpy as np\n", - "import pandas as pd\n", - "import pymongo\n", - "from pymongo import MongoClient" - ], - "id": "40a410f2" - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Connect to Atlas and SingleStore Kai endpoints\n", - "\n", - "Next, we will connect to the MongoDB Atlas instance using a Mongo client. We will need to connect to this instance to get our initial data, currently stored in Mongo." - ], - "id": "68eadfe5" - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "# No need to edit anything\n", - "myclientmongodb = pymongo.MongoClient(\"mongodb+srv://mongo_sample_reader:SingleStoreRocks27017@cluster1.tfutgo0.mongodb.net/?retryWrites=true&w=majority\")\n", - "mydbmongodb = myclientmongodb[\"new_transactions\"]\n", - "mongoitems = mydbmongodb[\"items\"]\n", - "mongocusts = mydbmongodb[\"custs\"]\n", - "mongotxs = mydbmongodb[\"txs\"]" - ], - "id": "9259e348" - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Then, we will need to connect to the SingleStore Kai API which will allow us to import and access the Mongo data we will move over from Mongo Atlas." - ], - "id": "c7d1953d" - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "db_to_use = database_to_use\n", - "s2clientmongodb = pymongo.MongoClient(connection_url_kai)\n", - "s2dbmongodb = s2clientmongodb[db_to_use]\n", - "s2mongoitems = s2dbmongodb[\"items\"]\n", - "s2mongocusts = s2dbmongodb[\"custs\"]\n", - "s2mongotxs = s2dbmongodb[\"txs\"]" - ], - "id": "2d4eced0" - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Copy Atlas collections into SingleStore Kai\n", - "\n", - "As our next step, we need to get our MongoDB data hosted in Atlas over to SingleStore. For this, we will run the following code that will then replicate the selected Mongo collections into our SingleStore instance. This will make the MongoDB data available in SingleStore, allowing us to migrate away from MongoDB and to perform all of our data storage and queries in a single database instead of having multiple data silos." - ], - "id": "84bc8dc3" - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "mongocollections = [mongoitems, mongocusts, mongotxs]\n", - "\n", - "for mongo_collection in mongocollections:\n", - " df = pd.DataFrame(list(mongo_collection.find())).reset_index(drop=True)\n", - " data_dict = df.to_dict(orient='records')\n", - " s2mongo_collection = s2dbmongodb[mongo_collection.name]\n", - " s2mongo_collection.insert_many(data_dict)" - ], - "id": "400f97a1" - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## QUERY 1: Total quantity of products sold across all products\n", - "\n", - "Our first query on the newly migrated data will be to retrieve the total quanitity of products across every product within our dataset. As you'll see, even though we are running in SingleStore, we can still use Mongo query syntax using SingleStore Kai." - ], - "id": "ddd381fb" - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "num_iterations = 10\n", - "mongo_times = []\n", - "\n", - "# Updated pipeline for total quantity of products sold across all products\n", - "pipeline = [\n", - " {\"$group\": {\"_id\": None, \"totalQuantity\": {\"$sum\": \"$item.quantity\"}}}\n", - "]\n", - "\n", - "# Simulating same for s2mongoitems\n", - "s2_times = []\n", - "for i in range(num_iterations):\n", - " s2_start_time = time.time()\n", - " s2_result = s2mongoitems.aggregate(pipeline)\n", - " s2_stop_time = time.time()\n", - " s2_times.append(s2_stop_time - s2_start_time)\n", - "\n", - "# Retrieving total quantity from the result\n", - "total_quantity = next(s2_result)[\"totalQuantity\"] if s2_result else 0\n", - "\n", - "# Returning the numeric values of total quantity sold\n", - "print(\"Total Product Quantity Sold is\",total_quantity)" - ], - "id": "a303e52f" - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### ACTION ITEM!\n", - "Take the output from this query and put it into the **ANSWER NUMBER 1** field in the Google Form." - ], - "id": "1ef685e6" - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## QUERY 2: Top selling Product\n", - "\n", - "Our next query will be to find the top selling product within our data. Once again, we are issuing a Mongo query against our SingleStore instance. If we had an application integrated with MongoDB but wanted to migrate to SingleStore, we could do so without having to rewrite the queries within our application!" - ], - "id": "999bc317" - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "# Updated pipeline to return the #1 selling product based on total quantity sold\n", - "pipeline = [\n", - " {\"$group\": {\n", - " \"_id\": \"$item.name\", # Group by product name\n", - " \"total_quantity_sold\": {\"$sum\": \"$item.quantity\"} # Sum of quantities sold\n", - " }},\n", - " {\"$sort\": {\"total_quantity_sold\": -1}}, # Sort by total quantity sold in descending order\n", - " {\"$limit\": 1} # Limit to the top product\n", - "]\n", - "\n", - "s2_result = s2mongoitems.aggregate(pipeline)\n", - "\n", - "# Retrieve the name of the #1 selling product\n", - "top_product = next(s2_result, None)\n", - "if top_product:\n", - " product_name = top_product[\"_id\"]\n", - " total_quantity_sold = top_product[\"total_quantity_sold\"]\n", - "else:\n", - " product_name = \"No Data\"\n", - " total_quantity_sold = 0\n", - "\n", - "# Return the #1 selling product and its total quantity sold\n", - "print(\"Top-Selling product : \",product_name,\"With total quantity sold \",total_quantity_sold)" - ], - "id": "8169cd92" - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### ACTION ITEM!\n", - "Take the output from this query and put it into the **ANSWER NUMBER 2** field in the Google Form." - ], - "id": "d8aae59c" - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## QUERY 3: Top selling Location" - ], - "id": "bb9b91f5" - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "# Updated pipeline to exclude \"Online\" and get top-selling location\n", - "pipeline = [\n", - " {\"$lookup\":\n", - " {\n", - " \"from\": \"custs\",\n", - " \"localField\": \"customer.email\",\n", - " \"foreignField\": \"email\",\n", - " \"as\": \"transaction_links\",\n", - " }\n", - " },\n", - " {\"$match\": {\"store_location\": {\"$ne\": \"Online\"}}}, # Exclude Online location\n", - " {\"$limit\": 100},\n", - " {\"$group\":\n", - " {\n", - " \"_id\": {\"location\": \"$store_location\"},\n", - " \"count\": {\"$sum\": 1}\n", - " }\n", - " },\n", - " {\"$sort\": {\"count\": -1}},\n", - " {\"$limit\": 1}\n", - "]\n", - "\n", - "\n", - "s2_result = s2mongotxs.aggregate(pipeline)\n", - "\n", - "\n", - "# Retrieve the top-selling location excluding \"Online\"\n", - "top_location = next(s2_result, None)\n", - "if top_location:\n", - " location_name = top_location[\"_id\"][\"location\"]\n", - " transaction_count = top_location[\"count\"]\n", - "else:\n", - " location_name = \"No Data\"\n", - " transaction_count = 0\n", - "\n", - "# Return the top-selling location and transaction count\n", - "\n", - "print(\"Top-Selling Location : \",location_name,\"With transaction of Count \",transaction_count)" - ], - "id": "2863eb28" - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### ACTION ITEM!\n", - "Take the output from this query and put it into the **ANSWER NUMBER 3** field in the Google Form." - ], - "id": "2e1ec36a" - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Clean up and submit!\n", - "\n", - "**Make sure to click submit on your Google Form to make sure you've been entered into the SingleStore NOW 2024 raffle!**\n", - "\n", - "Additionally, if you'd like to clean up your instance, you can run the statement below. To learn more about SingleStore, please connect with one of our SingleStore reps here at the conference!" - ], - "id": "793d7a60" - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "
\n", - " \n", - "
\n", - "

Action Required

\n", - "

If you created a new database in your Standard or Premium Workspace, you can drop the database by running the cell below. Note: this will not drop your database for Free Starter Workspaces. To drop a Free Starter Workspace, terminate the Workspace using the UI.

\n", - "
\n", - "
" - ], - "id": "29c7c0ec" - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "shared_tier_check = %sql show variables like 'is_shared_tier'\n", - "if not shared_tier_check or shared_tier_check[0][1] == 'OFF':\n", - " %sql DROP DATABASE IF EXISTS new_transactions;" - ], - "id": "ca401f18" - }, - { - "id": "35f88772", - "cell_type": "markdown", - "metadata": {}, - "source": [ - "
\n", - "
" - ] - } - ], - "metadata": { - "jupyterlab": { - "notebooks": { - "version_major": 6, - "version_minor": 4 - } - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.6" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -}