# Python with MongoDB: MongoDB Data Insertion and Querying Report

## Introduction

This report outlines the process of working with MongoDB, a NoSQL database, using a Python script. The goal is to insert and query data related to doctors' reviews. The report will cover data generation, data insertion, and data querying.

### MongoDB Connection

The Python script uses the pymongo library to connect to a MongoDB database hosted on MongoDB Atlas. The connection string includes the credentials (username and password) necessary for authentication.

In [1]:
import pymongo # pymongo is a python driver for MongoDB
import credentials # load username and password from credentials.py
connection_string = f"mongodb+srv://{credentials.username}:{credentials.password}@cluster0.k3c5wpz.mongodb.net/"

### Database Selection

The script connects to an existing database named 'ism6562_w05'. If this database does not exist, the script will create it.

In [2]:
client = pymongo.MongoClient(connection_string) # create a client object to connect to the database. get this cluster address from the MongoDB Atlas UI
db = client['ism6562_w05'] # this connects to an existing database called ism6562_w05 or creates a new databse is ism6562_w05 does not exist.

### Data Generation

The data is generated using Python's randomization functions. Lists of possible values for doctor_name, specialty, city, and review_text are predefined. A loop creates 100 review records, with each record containing random values for the attributes mentioned above. We will load the data in loack file as JSON to creat a file 

### Data Semantics

The data used in this project is synthetic, generated within the Python script. No external data sources are involved. The generated data simulates reviews for doctors and includes the following attributes:

id: A unique identifier for each review.
doctor_name: The name of the doctor being reviewed.
age: The age of the reviewer.
city: The city where the reviewer is located.
rating: A numerical rating given by the reviewer (from 1 to 5).
specialty: The medical specialty of the reviewed doctor.
review_text: A text-based review provided by the reviewer.

In [3]:
import json
from random import randint, choice
# Define some sample data
doctor_names = ["Dr. Smith", "Dr. Johnson", "Dr. Williams", "Dr. Brown", "Dr. Davis"]

specialties = ["Cardiologist", "Dermatologist", "Pediatrician", "Orthopedic Surgeon", "Neurologist"]

cities = ["New York", "Los Angeles", "Chicago", "Houston", "San Francisco"]
review_text = ["Great doctor!", "Very knowledgeable and friendly.", "Helped me a lot.", "Highly recommended."]


# Generate a JSON dataset with 100 records

data = []

for i in range(1, 101):

    record = {

        "id": i,

        "doctor_name": choice(doctor_names),

        "age": randint(18, 60),

        "city": choice(cities),

        "rating": randint(1, 5),

        "specialty": choice(specialties),  # Randomly select a department

        "review_text": choice(review_text)

    }
    data.append(record)

# Save the data to a JSON file

with open("Doctors.json", "w") as json_file:
    json.dump(data, json_file, indent=4)

Load Doctor_Data from JSON File

In [4]:
with open(r"c:\Users\himan\Downloads\Doctor.json", "r") as json_file:
    doctor_data = json.load(json_file)

### Collection Creation

Within the selected database, a collection named 'review' is created to store the doctor review data.

### Data Insertion
The synthetic doctor review data, generated earlier, is inserted into the 'review' collection using the insert_many method. The inserted_ids attribute is used to obtain the ID(s) of the inserted document(s).

In [5]:
posts = db['review'] # this creates a new collection called 'blogger' in the database for which we have credentials and an address.
post_id = posts.insert_many(doctor_data).inserted_ids # this inserts the post into the collection, then returns the id of the post
post_id

[ObjectId('6510d58fc0e72b4ff9e45509'),
 ObjectId('6510d58fc0e72b4ff9e4550a'),
 ObjectId('6510d58fc0e72b4ff9e4550b'),
 ObjectId('6510d58fc0e72b4ff9e4550c'),
 ObjectId('6510d58fc0e72b4ff9e4550d'),
 ObjectId('6510d58fc0e72b4ff9e4550e'),
 ObjectId('6510d58fc0e72b4ff9e4550f'),
 ObjectId('6510d58fc0e72b4ff9e45510'),
 ObjectId('6510d58fc0e72b4ff9e45511'),
 ObjectId('6510d58fc0e72b4ff9e45512'),
 ObjectId('6510d58fc0e72b4ff9e45513'),
 ObjectId('6510d58fc0e72b4ff9e45514'),
 ObjectId('6510d58fc0e72b4ff9e45515'),
 ObjectId('6510d58fc0e72b4ff9e45516'),
 ObjectId('6510d58fc0e72b4ff9e45517'),
 ObjectId('6510d58fc0e72b4ff9e45518'),
 ObjectId('6510d58fc0e72b4ff9e45519'),
 ObjectId('6510d58fc0e72b4ff9e4551a'),
 ObjectId('6510d58fc0e72b4ff9e4551b'),
 ObjectId('6510d58fc0e72b4ff9e4551c'),
 ObjectId('6510d58fc0e72b4ff9e4551d'),
 ObjectId('6510d58fc0e72b4ff9e4551e'),
 ObjectId('6510d58fc0e72b4ff9e4551f'),
 ObjectId('6510d58fc0e72b4ff9e45520'),
 ObjectId('6510d58fc0e72b4ff9e45521'),
 ObjectId('6510d58fc0e72b

In [6]:
client.close() # close the connection to the database

![](images/first_insert.png)