MongoDB:
  1. Take one of the previous SQL databases created by you
  2. In Python, write the code which extracts the tables from that DB
  3. Load the data from these tables as the collections to MongoDB collections (create them before)
  3.1. Create unique index, so it matches the key value in the initial SQL database
  4. Insert at least 3 entries to these collections
  5. Perform at least 3 selections based on different (MIXED - "one or another","one and another", etc.) filter
  6. Perform at least 3 update operations


Redis DB:
  1. Use the same SQL data as in Part 1.
  2. Create Redis DB, with the data from SQL:
    Set a key with a specific value in Redis.
    Retrieve the value of a key from Redis.
    Delete a key-value pair from Redis.

# MongoDB

Taking a previous database (students database from july 27 task) and preparing it to insert to Mongo database

In [6]:
# connect to mysql
import mysql.connector
import pandas as pd

conn = mysql.connector.connect(host='localhost',
                                database= "student_enrollment_2023",
                                user='root',
                                password='')

cursor = conn.cursor(buffered= True)

query1 = "SELECT * FROM students;"
cursor.execute(query1)
result = cursor.fetchall()
columns1 = [desc[0] for desc in cursor.description]
students_table = pd.DataFrame(result, columns = columns1)

query2 = "SELECT * FROM courses;"
cursor.execute(query2)
result = cursor.fetchall()
columns2 = [desc[0] for desc in cursor.description]
courses_table = pd.DataFrame(result, columns = columns2)

query3 = "SELECT * FROM course_choices;"
cursor.execute(query3)
result = cursor.fetchall()
columns3 = [desc[0] for desc in cursor.description]
choices_table = pd.DataFrame(result, columns = columns3)

cursor.close()
conn.close()

In [14]:
merge1 = choices_table.merge(courses_table, how = "left", on = "course_id")

In [16]:
student_df = merge1.merge(students_table, how = "left", on = "student_id")

In [19]:
students_df = student_df.drop(["choice_id", "course_id", "student_no_limit", "credits"], axis = 1)
students_df

Unnamed: 0,student_id,course_name,name,surname,credits_taken
0,2,Spectroscopy,Inga,Gonzalez,30
1,1,Linear Algebra,Maria,Jones,30
2,2,Enzymology,Inga,Gonzalez,30
3,3,Enzymology,Julia,Pipiro,30
4,4,Linear Algebra,Chris,Goblin,30
...,...,...,...,...,...
241,59,Physics,Andrew,Powell,30
242,15,Analytical Chemistry,Noah,Thompson,30
243,59,Quantum Physics,Andrew,Powell,30
244,59,Stereochemistry,Andrew,Powell,30


In [24]:
grouped = students_df.groupby("student_id")["course_name"].agg(list).reset_index()

In [25]:
final_df = students_table.merge(grouped, how = "left", on = "student_id")
final_df.head(10)

Unnamed: 0,student_id,name,surname,credits_taken,course_name
0,1,Maria,Jones,30,"[Linear Algebra, Microbiology, Panel readings,..."
1,2,Inga,Gonzalez,30,"[Spectroscopy, Enzymology, Enzymology]"
2,3,Julia,Pipiro,30,"[Enzymology, Electromechanics, Stereochemistry..."
3,4,Chris,Goblin,30,"[Linear Algebra, Physics, Panel readings, Anal..."
4,5,Ken,Katty,30,"[Psychology, Enzymology, Stereochemistry, Disc..."
5,6,Aliya,Examplier,30,"[Physics, Electromechanics, Electromechanics]"
6,7,Thomas,Capitol,30,"[Linear Algebra, Electromechanics, Microbiolog..."
7,8,Justin,Zymus,30,"[Biochemistry, Analytical Chemistry, Panel rea..."
8,9,Cassandra,Antrapa,30,"[Panel readings, Molecular Biotechnology, Phys..."
9,10,Mia,Johnson,30,"[Physics, Biochemistry, Stereochemistry]"


In order to have more data for the tasks, I added more entries to july 27 task.

Inserting the data to MongoDB

In [33]:
# Connect to mongo db
from pymongo.mongo_client import MongoClient
from pymongo.server_api import ServerApi

uri = "mongodb+srv://<logininfo>@cluster0test.<>.mongodb.net/?retryWrites=false&w=majority"

# Create a new client and connect to the server
client = MongoClient(uri, server_api=ServerApi('1'))

# Send a ping to confirm a successful connection
try:
    client.admin.command('ping')
    print("Pinged your deployment. You successfully connected to MongoDB!")
except Exception as e:
    print(e)

Pinged your deployment. You successfully connected to MongoDB!


In [49]:
db = client["Cluster0test"]
collection = db["Students"]

In [51]:
records = final_df.to_dict(orient = "records")
collection.insert_many(records)

<pymongo.results.InsertManyResult at 0x7f43e04c82e0>

Inserting at least 3 entries to the collection

In [52]:
entry1 = {
    "student_id" : 60,
    "name" : "Julian",
    "surname" : "Birkin",
    "credits_taken" : 30,
    "course_name" : ["Microbiology", "Psychology", "Biochemistry", "Analytical Chemistry"]
}

entry2 = {
    "student_id" : 61,
    "name" : "Emma",
    "surname" : "Watts",
    "credits_taken" : 30,
    "course_name" : ["Linear Algebra", "Spectroscopy", "Enzymology", "Panel readings"]
}

entry3 = {
    "student_id" : 62,
    "name" : "Jimmy",
    "surname" : "Jams",
    "credits_taken" : 30,
    "course_name" : ["Linear Algebra", "Molecular Biotechnology", "Quantum Physics", "Stereochemistry", "Economics", "Psychology"]
}

entry4 = {
    "student_id" : 63,
    "name" : "Melinda",
    "surname" : "Morris",
    "credits_taken" : 30,
    "course_name" : ["Discrete Mathematics", "Electromechanics", "Physics"]
}

entry5 = {
    "student_id" : 64,
    "name" : "Zizi",
    "surname" : "Melou",
    "credits_taken" : 30,
    "course_name" : ["Spectroscopy", "Biochemistry", "Economics"]
}

In [53]:
collection.insert_many([entry1, entry2, entry3, entry4, entry5])

<pymongo.results.InsertManyResult at 0x7f43e04161a0>

Performing at least 3 selections based on different (MIXED - "one or another","one and another", etc.) filter

In [56]:
# selection1: finding students with the names Aiden or Ava

results = collection.find({"$or":[{"name":"Aiden"},{"name":"Ava"}]})

In [57]:
for entry in results:
    print(entry)

{'_id': ObjectId('64da2e406deb66336799158c'), 'student_id': 14, 'name': 'Ava', 'surname': 'Williams', 'credits_taken': 30, 'course_name': ['Enzymology', 'Molecular Biotechnology', 'Panel readings', 'Molecular Biotechnology', 'Psychology']}
{'_id': ObjectId('64da2e406deb663367991591'), 'student_id': 19, 'name': 'Aiden', 'surname': 'Brown', 'credits_taken': 30, 'course_name': ['Microbiology', 'Spectroscopy', 'Physics']}


In [54]:
# selection2: finding students who are studying either microbiology or economics

results = collection.find({"course_name":{"$in":['Microbiology' , 'Economics']}})

In [55]:
for entry in results:
    print(entry)

{'_id': ObjectId('64da2e406deb66336799157f'), 'student_id': 1, 'name': 'Maria', 'surname': 'Jones', 'credits_taken': 30, 'course_name': ['Linear Algebra', 'Microbiology', 'Panel readings', 'Discrete Mathematics']}
{'_id': ObjectId('64da2e406deb663367991585'), 'student_id': 7, 'name': 'Thomas', 'surname': 'Capitol', 'credits_taken': 30, 'course_name': ['Linear Algebra', 'Electromechanics', 'Microbiology', 'Panel readings']}
{'_id': ObjectId('64da2e406deb663367991589'), 'student_id': 11, 'name': 'Ethan', 'surname': 'Martinez', 'credits_taken': 30, 'course_name': ['Analytical Chemistry', 'Quantum Physics', 'Discrete Mathematics', 'Microbiology']}
{'_id': ObjectId('64da2e406deb66336799158e'), 'student_id': 16, 'name': 'Sophia', 'surname': 'Davis', 'credits_taken': 30, 'course_name': ['Panel readings', 'Economics', 'Electromechanics', 'Physics']}
{'_id': ObjectId('64da2e406deb663367991591'), 'student_id': 19, 'name': 'Aiden', 'surname': 'Brown', 'credits_taken': 30, 'course_name': ['Microbi

In [58]:
# selection3: finding students who are studying biochemistry and analytical chemistry

results = collection.find({"course_name":{"$all":['Biochemistry','Analytical Chemistry']}})

In [59]:
for entry in results:
    print(entry)

{'_id': ObjectId('64da2e406deb663367991586'), 'student_id': 8, 'name': 'Justin', 'surname': 'Zymus', 'credits_taken': 30, 'course_name': ['Biochemistry', 'Analytical Chemistry', 'Panel readings', 'Analytical Chemistry']}
{'_id': ObjectId('64da2e406deb66336799158a'), 'student_id': 12, 'name': 'Olivia', 'surname': 'Smith', 'credits_taken': 30, 'course_name': ['Psychology', 'Biochemistry', 'Psychology', 'Analytical Chemistry']}
{'_id': ObjectId('64da31276deb6633679915ba'), 'student_id': 60, 'name': 'Julian', 'surname': 'Birkin', 'credits_taken': 30, 'course_name': ['Microbiology', 'Psychology', 'Biochemistry', 'Analytical Chemistry']}


Performing at least 3 update operations

In [60]:
# update1: changing student's name from Lucas to Lukas

collection.update_one({"student_id": 21}, {"$set": {"name": "Lukas"}})

<pymongo.results.UpdateResult at 0x7f43e04cb5e0>

In [64]:
# update2: changing subjects from "Linear Algebra" to "Economics" since linear algebra is mentioned twice

collection.update_one({"student_id": 48}, {"$set": {'course_name': ['Quantum Physics', 'Microbiology', 'Molecular Biotechnology', 'Linear Algebra', 'Economics']}})

<pymongo.results.UpdateResult at 0x7f43e044a140>

In [65]:
# update3: adding a middle name and changing surname

collection.update_one({"student_id": 60}, {"$set": {"name": "Julian Neil", "surname": "Williams"}})

<pymongo.results.UpdateResult at 0x7f43f83e3c10>

# Redis

Putting same data (final_df) to redis df

In [66]:
import redis

In [67]:
r = redis.StrictRedis(host= 'localhost', port=6379, db=0)

Way1: putting all info extracted from mysql (same final_df table made in Mongo DB exercise) as *whole* in a value.

In [80]:
import pickle
df_encoded = pickle.dumps(final_df)
r.set("student_data", df_encoded)

True

In [81]:
# retrieving the info
retrieved_info = r.get("student_data")
final = pickle.loads(retrieved_info)
final

Unnamed: 0,student_id,name,surname,credits_taken,course_name
0,1,Maria,Jones,30,"[Linear Algebra, Microbiology, Panel readings,..."
1,2,Inga,Gonzalez,30,"[Spectroscopy, Enzymology, Enzymology]"
2,3,Julia,Pipiro,30,"[Enzymology, Electromechanics, Stereochemistry..."
3,4,Chris,Goblin,30,"[Linear Algebra, Physics, Panel readings, Anal..."
4,5,Ken,Katty,30,"[Psychology, Enzymology, Stereochemistry, Disc..."
5,6,Aliya,Examplier,30,"[Physics, Electromechanics, Electromechanics]"
6,7,Thomas,Capitol,30,"[Linear Algebra, Electromechanics, Microbiolog..."
7,8,Justin,Zymus,30,"[Biochemistry, Analytical Chemistry, Panel rea..."
8,9,Cassandra,Antrapa,30,"[Panel readings, Molecular Biotechnology, Phys..."
9,10,Mia,Johnson,30,"[Physics, Biochemistry, Stereochemistry]"


In [82]:
# deleting the key
r.delete("student_data")

1

Way2: transforming final_df even more and adding only a row (entry) as a value.

In [88]:
for index, row in final_df.iterrows():
    student_id = row["student_id"]
    row["course_name"] = '/'.join(row["course_name"]) #changing to string because list is not acceptable to redis
    data = {"name": row["name"], "surname": row["surname"], "credits_taken": row["credits_taken"], "course_name": row["course_name"]}

    for key, value in data.items():
        r.hset(student_id, key, value)

In [93]:
#retrieving some entries
student_10 = r.hgetall(10)
student_10

{b'name': b'Mia',
 b'surname': b'Johnson',
 b'credits_taken': b'30',
 b'course_name': b'Physics/Biochemistry/Stereochemistry'}

In [94]:
student_55 = r.hgetall(55)
student_55

{b'credits_taken': b'30',
 b'name': b'Gabriel',
 b'course_name': b'Stereochemistry/Linear Algebra/Spectroscopy/Psychology/Molecular Biotechnology',
 b'surname': b'Rogers'}

In [95]:
#deleting entry

r.delete(10)

1

In [97]:
student_10 = r.hgetall(10)
student_10

{}