### **Step1: Connecting to MongoDB Cluster**
### **Creating a new Database 'jewels'**

In [15]:
import pymongo # This line imports the pymongo library, which is a Python driver for MongoDB
import credentials # loading username and password from credentials.py
connection_string = f"mongodb+srv://{credentials.username}:{credentials.password}@cluster0.xfd8951.mongodb.net/" 

In [16]:
client = pymongo.MongoClient(connection_string) # creating a client object to connect to the database. getting this cluster address from the MongoDB Atlas UI
db = client['jewels'] # this creates a new database named jewels.

# 'jewels' is the name of the database we are connecting to. It doesn't exists. So, it will be created.

### **Step2:** 
### **Synthesizing data into MongoDB and analyzing Jewelery Ratings Data**

Here we 'synthesize' a set of data about jewels ratings.I will explore and analyze jewelry ratings data. The data consists of user ratings for various jewelry items, including information about the item name, material, and user ratings.

#### **Data Source**

The data used in this analysis is synthesized for demonstration purposes. I have generated a random dataset as shown in the file **'data_prep.ipynb'** for jewelry items, including bracelets, rings, earrings, necklaces, and anklets. Each item has associated user ratings and materials such as silver, gold, titanium, platinum, and diamond. The data was created using Python and then stored in a JSON file format as **'ratings_dataset.json'**

#### **Data Structure** 
The dataset is structured as follows:

**jewel_name:** The name of the jewelry item.<br>
**rating:** User rating for the item on a scale of 1 to 10.<br>
**jewel_material:** The material used in the jewelry item.

##### **Collection name: ratings** 

In [17]:
ratings_collection = db["ratings"] # this is a collection called "ratings" in the database "jewels"
json_file = "ratings_dataset.json"
import json

if ratings_collection in db.list_collection_names():
    db['ratings'].drop()

with open(json_file, "r") as file:
        data = json.load(file)

    # Insert the data into the collection
ratings_collection.insert_many(data)


<pymongo.results.InsertManyResult at 0x23f359c8430>

In [18]:
ratings_collection = db["ratings"]


#### Check the complete collection documents at glance below

In [19]:
query = {} # search for any name that starts with S
doc = ratings_collection.find(query)
for record in doc:
  print(record)

{'_id': ObjectId('6510c42fc3b9a8d56d1e81df'), 'jewel_name': 'bracelet', 'rating': 6, 'jewel_material': 'silver'}
{'_id': ObjectId('6510c42fc3b9a8d56d1e81e0'), 'jewel_name': 'anklet', 'rating': 5, 'jewel_material': 'diamond'}
{'_id': ObjectId('6510c42fc3b9a8d56d1e81e1'), 'jewel_name': 'anklet', 'rating': 6, 'jewel_material': 'platinum'}
{'_id': ObjectId('6510c42fc3b9a8d56d1e81e2'), 'jewel_name': 'anklet', 'rating': 5, 'jewel_material': 'silver'}
{'_id': ObjectId('6510c42fc3b9a8d56d1e81e3'), 'jewel_name': 'ring', 'rating': 6, 'jewel_material': 'titanium'}
{'_id': ObjectId('6510c42fc3b9a8d56d1e81e4'), 'jewel_name': 'anklet', 'rating': 8, 'jewel_material': 'diamond'}
{'_id': ObjectId('6510c42fc3b9a8d56d1e81e5'), 'jewel_name': 'bracelet', 'rating': 8, 'jewel_material': 'diamond'}
{'_id': ObjectId('6510c42fc3b9a8d56d1e81e6'), 'jewel_name': 'necklace', 'rating': 6, 'jewel_material': 'titanium'}
{'_id': ObjectId('6510c42fc3b9a8d56d1e81e7'), 'jewel_name': 'anklet', 'rating': 1, 'jewel_material'

### **Step 3:**
#### **Reivewed the cluster and verified that this data has been loaded, i.e. successfully loaded data into the database 'jewels' and the collection called 'ratings'**


### **Step 4:**
### Demonstrating an aggregation query on the data
#### **Example aggregation query: <br> Calculate the average rating for each jewelry material and sort it in descending order**

In [20]:
# Example aggregation query: Calculate the average rating for each jewelry material
pipeline = [
    {
        '$group': {
            '_id': '$jewel_material',
            'average_rating': {'$avg': '$rating'}
        }
    },
    {
        '$sort': {'average_rating': -1}
    }
]

result = list(ratings_collection.aggregate(pipeline))

# Print the result
for item in result:
    print(f"Material: {item['_id']}, Average Rating: {item['average_rating']:.2f}")


Material: gold, Average Rating: 8.00
Material: platinum, Average Rating: 6.33
Material: titanium, Average Rating: 6.00
Material: diamond, Average Rating: 5.88
Material: silver, Average Rating: 5.00


##### The above query's output calculates and sorts the average ratings for each jewelry material in descending order. We are interested in finding out which material receives the highest average rating among users. We can see that Gold receives the highest average rating of 8.00 among users.

### **Step 4: Save the query results from the query to a JSON file format.** 
To save the query results, we can use Python's built-in json module 

In [21]:
#Saving the results from the query to a JSON file format.
import json

# As 'result' contains the query results,
with open('results.json', 'w') as json_file:
    json.dump(result, json_file)

# Now, the results are saved to 'results.json' in JSON format.

**The results are saved to 'results.json' in JSON format.**

### **Step 5: Terminate the cluster connection after use** 

In [22]:
client.close()