In [2]:
!pip install openai
!pip install python-dotenv

Collecting openai
  Obtaining dependency information for openai from https://files.pythonhosted.org/packages/42/09/9065769c99fd6a277ca0395e06a3eb9df9dbccf0dc1283ee66e007348484/openai-1.5.0-py3-none-any.whl.metadata
  Downloading openai-1.5.0-py3-none-any.whl.metadata (17 kB)
Collecting anyio<5,>=3.5.0 (from openai)
  Obtaining dependency information for anyio<5,>=3.5.0 from https://files.pythonhosted.org/packages/bf/cd/d6d9bb1dadf73e7af02d18225cbd2c93f8552e13130484f1c8dcfece292b/anyio-4.2.0-py3-none-any.whl.metadata
  Downloading anyio-4.2.0-py3-none-any.whl.metadata (4.6 kB)
Collecting distro<2,>=1.7.0 (from openai)
  Using cached distro-1.8.0-py3-none-any.whl (20 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Obtaining dependency information for httpx<1,>=0.23.0 from https://files.pythonhosted.org/packages/a2/65/6940eeb21dcb2953778a6895281c179efd9100463ff08cb6232bb6480da7/httpx-0.25.2-py3-none-any.whl.metadata
  Downloading httpx-0.25.2-py3-none-any.whl.metadata (6.9 kB)
Collecting 

In [8]:
from openai import OpenAI
import os
from dotenv import load_dotenv

In [10]:
_ = load_dotenv()
openai_client = OpenAI()

In [19]:
def get_mongodb_query(input_data, output_data, model="gpt-3.5-turbo"):
    system_prompt = "You are a MongoDB expert and your task is to write a MongoDB query to produce the expected output for the given input data."
    user_prompt = f"""
    Input data: {input_data} 
    Expected output data: {output_data}
    """
    messages = []
    messages.append({"role": "system", "content": system_prompt})
    messages.append({"role": "user", "content": user_prompt})
    chat_completion = openai_client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0
    )
    print(f"Assistant Response:\n{chat_completion.choices[0].message.content}")

### Example 1

In [30]:
ex1_input_data = """
[
  {
    "name": "Sachin",
    "team": "India"
  },
  {
    "name": "Sourav",
    "team": "India"
  },
  {
    "name": "Lara",
    "team": "West Indies"
  }
]
"""

ex1_output_data = """
[
 {
   "team": India,
   "playerCount": 2
 },
 {
   "team": "West Indies",
   "playerCount": 1
 }
]
"""

In [31]:
get_mongodb_query(ex1_input_data, ex1_output_data)

Assistant Response:
To achieve the expected output, you can use the MongoDB aggregation framework with the `$group` and `$project` stages. Here's the query you can use:

```javascript
db.collection.aggregate([
  {
    $group: {
      _id: "$team",
      playerCount: { $sum: 1 }
    }
  },
  {
    $project: {
      _id: 0,
      team: "$_id",
      playerCount: 1
    }
  }
])
```

This query will group the documents by the `team` field and calculate the count of players for each team using the `$sum` operator. Then, in the `$project` stage, it will reshape the output to include only the `team` and `playerCount` fields.

The result will be:

```javascript
[
  {
    "team": "India",
    "playerCount": 2
  },
  {
    "team": "West Indies",
    "playerCount": 1
  }
]
```


### Example 2

Your task is to write a MongoDB aggregation pipeline to find the documents that have duplicates in the nested array "courses" and count the number of times those duplicate items are present in the array.  

In [32]:
ex2_input_data = """
[
 {
   "student": "Sachin",
   "courses": [
      {
        "courseName": "batting",
        "marks": 100
      },
      {
        "courseName": "batting",
        "marks": 50
      },
      {
        "courseName": "fielding",
        "marks": 60
      }
   ]
 },
 {
   "student": "Sourav",
   "courses": [
      {
        "courseName": "batting",
        "marks": 80
      },
      {
        "courseName": "bowling",
        "marks": 60
      },
      {
        "courseName": "fielding",
        "marks": 40
      }
   ]
 }
]
"""

ex2_output_data = """
[
  {
    "student": "Sachin"
    "duplicateCourses": [
      {
        "courseName": "batting"
        "duplicateCount": 2
      }
    ]
  }
]
"""

In [33]:
get_mongodb_query(ex2_input_data, ex2_output_data)

Assistant Response:
To produce the expected output, you can use the MongoDB aggregation framework with the following query:

```javascript
db.students.aggregate([
  {
    $unwind: "$courses"
  },
  {
    $group: {
      _id: {
        student: "$student",
        courseName: "$courses.courseName"
      },
      count: { $sum: 1 }
    }
  },
  {
    $match: {
      count: { $gt: 1 }
    }
  },
  {
    $group: {
      _id: "$_id.student",
      duplicateCourses: {
        $push: {
          courseName: "$_id.courseName",
          duplicateCount: "$count"
        }
      }
    }
  },
  {
    $project: {
      _id: 0,
      student: "$_id",
      duplicateCourses: 1
    }
  }
])
```

This query performs the following steps:

1. `$unwind` the `courses` array to create a separate document for each course.
2. `$group` the documents by `student` and `courseName`, and calculate the count of each combination.
3. `$match` the documents where the count is greater than 1.
4. `$group` the documents

### Example 3

A simple find example

In [34]:
ex3_input_data = """
[
 {
    "name": "Sachin",
    "age": 50,
    "team": "India"
 },
 {
   "name": "Lara",
   "age": 52,
   "team": "India"
 }
]
"""

ex3_output_data = """
[
  {
    "name": "Sachin",
    "age": 50,
    "team": "India"
  }
]
"""

In [35]:
get_mongodb_query(ex3_input_data, ex3_output_data)

Assistant Response:
To produce the expected output, you can use the following MongoDB query:

```javascript
db.collection.find({ "name": "Sachin" })
```

This query will find all documents in the collection where the "name" field is equal to "Sachin". The result will be the document with the name "Sachin", as shown in the expected output.


### Example 4

Add a field

In [36]:
ex4_input_data = """
[
  {
    "_id": ObjectId("5bdb6a44d9b2d4645509db2e"),
    "crs": {
      "type": "name",
      "properties": {
        "name": "urn:ogc:def:crs:OGC:1.3:CRS84"
      }
    },
    "type": "FeatureCollection",
    "features": [
      {
        "geometry": {
          "type": "Point",
          "coordinates": [
            45,
            66
          ]
        },
        "type": "Feature",
        "id": 50,
        "properties": {
          "fogClass": 0,
          "_note": "movable",
          "fileLocation": "blah.jpg",
          "timeStamp": "2018-11-01 14:51:00",
          "predFALSE": 0.998167,
          "ipAddr": "http://abcd.ef",
          "longitude": "45",
          "predTRUE": 0.001833,
          "cameraID": "IDABC",
          "originalPath": "originalBlah.jpg",
          "location": "location1",
          "latitude": "66"
        }
      }
    ]
  }
]
"""

ex4_output_data = """
[
  {
    "_id": ObjectId("5bdb6a44d9b2d4645509db2e"),
    "crs": {
      "type": "name",
      "properties": {
        "name": "urn:ogc:def:crs:OGC:1.3:CRS84"
      }
    },
    "type": "FeatureCollection",
    "features": [
      {
        "geometry": {
          "type": "Point",
          "coordinates": [
            45,
            66
          ]
        },
        "type": "Feature",
        "id": 50,
        "properties": {
          "fogClass": 0,
          "_note": "movable",
          "fileLocation": "blah.jpg",
          "timeStamp": "2018-11-01 14:51:00",
          "predFALSE": 0.998167,
          "ipAddr": "http://abcd.ef",
          "longitude": "45",
          "predTRUE": 0.001833,
          "cameraID": "IDABC",
          "originalPath": "originalBlah.jpg",
          "location": "location1",
          "latitude": "66"
        },
        "timeMongo": ISODate("2018-11-01T14:51:00Z")
      }
    ]
  }
]
"""

In [37]:
get_mongodb_query(ex4_input_data, ex4_output_data)

Assistant Response:
To produce the expected output, you can use the following MongoDB query:

```javascript
db.collection.aggregate([
  {
    $addFields: {
      "features.0.timeMongo": {
        $dateFromString: {
          dateString: "$features.0.properties.timeStamp"
        }
      }
    }
  }
])
```

This query uses the `$addFields` aggregation stage to add a new field called `timeMongo` to the `features` array. The value of this field is obtained by converting the `timeStamp` field from a string to a date using the `$dateFromString` aggregation operator.

Note that you need to replace `collection` with the actual name of your MongoDB collection.


### Example 5

In [38]:
ex5_input_data = """
[
  {
    "bookCategory": "Non-Fiction",
    "books": [
      {
        "bookName": "Seven Habits",
        "pages": 200,
        "authors": [
          {
            "authorName": "Sachin",
            "authorEmail": "sachin@gmail.com"
          },
          {
            "authorName": "Sourav",
            "authorEmail": "sourav@gmail.com"
          }
        ]
      },
      {
        "bookName": "One thing",
        "pages": 100,
        "authors": [
          {
            "authorName": "Sachin",
            "authorEmail": "sachin@gmail.com"
          }
        ]
      },
      
    ]
  },
  {
    "bookCategory": "Fiction",
    "books": [
      {
        "bookName": "Harry Potter",
        "pages": 400,
        "authors": [
          {
            "authorName": "Sachin",
            "authorEmail": "sachin@gmail.com"
          },
          {
            "authorName": "Tim",
            "authorEmail": "Tim@gmail.com"
          }
        ]
      },
      {
        "bookName": "Alchemist",
        "pages": 100,
        "authors": [
          {
            "authorName": "Sourav",
            "authorEmail": "sourav@gmail.com"
          }
        ]
      },
      
    ]
  }
]
"""

ex5_output_data = """
[
  {
     "authorName": "Sachin",
     "bookName": [
        "Seven Habits",
        "One thing",
        "Harry Potter"
     ] 
  },
  {
     "authorName": "Sourav",
     "bookName": [
        "Seven Habits",
        "Alchemist"
     ] 
  },
  {
     "authorName": "Tim",
     "bookName": [
        "Harry Potter"
     ] 
  }
]
"""

In [39]:
get_mongodb_query(ex5_input_data, ex5_output_data)

Assistant Response:
To produce the expected output, you can use the MongoDB aggregation framework with the `$unwind`, `$group`, and `$project` stages. Here's the MongoDB query:

```javascript
db.collection.aggregate([
  {
    $unwind: "$books"
  },
  {
    $unwind: "$books.authors"
  },
  {
    $group: {
      _id: "$books.authors.authorName",
      bookName: { $push: "$books.bookName" }
    }
  },
  {
    $project: {
      _id: 0,
      authorName: "$_id",
      bookName: 1
    }
  }
])
```

This query first unwinds the `books` array and then unwinds the `authors` array within each book. Then, it groups the documents by the `authorName` field and uses the `$push` operator to create an array of `bookName` values for each author. Finally, it projects the desired fields and removes the `_id` field from the output.

The query will return the expected output data:

```javascript
[
  {
    "authorName": "Sachin",
    "bookName": [
      "Seven Habits",
      "One thing",
      "Harry Potter

### Example 6

In [40]:
ex6_input_data = """
[
  {
    "bookCategory": "Non-Fiction",
    "books": [
      {
        "bookName": "Seven Habits",
        "pages": 200,
        "authors": [
          {
            "authorName": "Sachin",
            "authorEmail": "sachin@gmail.com"
          },
          {
            "authorName": "Sourav",
            "authorEmail": "sourav@gmail.com"
          }
        ]
      },
      {
        "bookName": "One thing",
        "pages": 100,
        "authors": [
          {
            "authorName": "Sachin",
            "authorEmail": "sachin@gmail.com"
          }
        ]
      },
      
    ]
  },
  {
    "bookCategory": "Fiction",
    "books": [
      {
        "bookName": "Harry Potter",
        "pages": 400,
        "authors": [
          {
            "authorName": "Sachin",
            "authorEmail": "sachin@gmail.com"
          },
          {
            "authorName": "Tim",
            "authorEmail": "Tim@gmail.com"
          }
        ]
      },
      {
        "bookName": "Alchemist",
        "pages": 100,
        "authors": [
          {
            "authorName": "Sourav",
            "authorEmail": "sourav@gmail.com"
          }
        ]
      },
      
    ]
  }
]
"""

ex6_output_data = """
[
  {
    "authors": [
      {
        "authorEmail": "sachin@gmail.com",
        "authorName": "Sachin"
      },
      {
        "authorEmail": "sourav@gmail.com",
        "authorName": "Sourav"
      }
    ],
    "bookName": "Seven Habits",
    "pages": 200
  },
  {
    "authors": [
      {
        "authorEmail": "sachin@gmail.com",
        "authorName": "Sachin"
      }
    ],
    "bookName": "One thing",
    "pages": 100
  },
  {
    "authors": [
      {
        "authorEmail": "sachin@gmail.com",
        "authorName": "Sachin"
      },
      {
        "authorEmail": "Tim@gmail.com",
        "authorName": "Tim"
      }
    ],
    "bookName": "Harry Potter",
    "pages": 400
  },
  {
    "authors": [
      {
        "authorEmail": "sourav@gmail.com",
        "authorName": "Sourav"
      }
    ],
    "bookName": "Alchemist",
    "pages": 100
  }
]
"""

In [41]:
get_mongodb_query(ex6_input_data, ex6_output_data)

Assistant Response:
To produce the expected output, you can use the MongoDB aggregation framework with the `$unwind` and `$project` stages. Here's the MongoDB query:

```javascript
db.collection.aggregate([
  {
    $unwind: "$books"
  },
  {
    $unwind: "$books.authors"
  },
  {
    $project: {
      _id: 0,
      bookName: "$books.bookName",
      pages: "$books.pages",
      authors: {
        authorName: "$books.authors.authorName",
        authorEmail: "$books.authors.authorEmail"
      }
    }
  }
])
```

This query first uses the `$unwind` stage twice to flatten the `books` and `authors` arrays. Then, the `$project` stage is used to reshape the documents and include only the desired fields.

The output will be the expected output data you provided.


### Example 7

In [42]:
ex7_input_data = """
[
  {
    "bookName": "Seven Habits",
    "pages": 200,
    "authors": [
      {
        "authorName": "Sachin",
        "authorEmail": "sachin@gmail.com"
      },
      {
        "authorName": "Sourav",
        "authorEmail": "sourav@gmail.com"
      }
    ]
  },
  {
    "bookName": "One thing",
    "pages": 100,
    "authors": [
      {
        "authorName": "Sachin",
        "authorEmail": "sachin@gmail.com"
      }
    ]
  },
  {
    "bookName": "Harry Potter",
    "pages": 400,
    "authors": [
      {
        "authorName": "Sachin",
        "authorEmail": "sachin@gmail.com"
      },
      {
        "authorName": "Tim",
        "authorEmail": "Tim@gmail.com"
      }
    ]
  },
  {
    "bookName": "Alchemist",
    "pages": 100,
    "authors": [
      {
        "authorName": "Sourav",
        "authorEmail": "sourav@gmail.com"
      }
    ]
  }
]
"""

ex7_output_data = """
[
    {
        "bookName": "Seven Habits",
        "pages": 200,
        "authorName": "Sachin",
        "authorEmail": "sachin@gmail.com"
    },
    {
        "bookName": "Seven Habits",
        "pages": 200,
        "authorName": "Sourav",
        "authorEmail": "sourav@gmail.com"
    },
    {
        "bookName": "One thing",
        "pages": 100,
        "authorName": "Sachin",
        "authorEmail": "sachin@gmail.com"
    },
    {
        "bookName": "Harry Potter",
        "pages": 400,
        "authorName": "Sachin",
        "authorEmail": "sachin@gmail.com"
    },
    {
        "bookName": "Harry Potter",
        "pages": 400,
        "authorName": "Tim",
        "authorEmail": "Tim@gmail.com"
    },
    {
        "bookName": "Alchemist",
        "pages": 100,
        "authorName": "Sourav",
        "authorEmail": "sourav@gmail.com"
    }
]
"""

In [43]:
get_mongodb_query(ex7_input_data, ex7_output_data)

Assistant Response:
To produce the expected output, you can use the MongoDB aggregation framework with the `$unwind` and `$project` stages. Here's the query:

```javascript
db.books.aggregate([
  { $unwind: "$authors" },
  {
    $project: {
      _id: 0,
      bookName: 1,
      pages: 1,
      authorName: "$authors.authorName",
      authorEmail: "$authors.authorEmail"
    }
  }
])
```

This query first uses the `$unwind` stage to deconstruct the `authors` array, creating a separate document for each author. Then, the `$project` stage is used to reshape the documents and include only the desired fields (`bookName`, `pages`, `authorName`, and `authorEmail`). The `_id` field is excluded from the output using `_id: 0`.

The result will be the expected output data.


### Example 8

In [44]:
ex8_input_data = """
[
    {
        "studentName": "Pete",
        "subjects": [
            {
                "subjectName": "Math",
                "result": "passed"
            },
            {
                "subjectName": "Physics",
                "result": "passed"
            },
            {
                "subjectName": "Chemistry",
                "result": "failed"
            },
            {
                "subjectName": "Botany",
                "result": "failed"
            },
            {
                "subjectName": "Zoology",
                "result": "failed"
            }
        ]
    }
]
"""

ex8_output_data = """
{
    "totalResultCount": 5,
    "totalPassedCount": 2,
    "totalFailedCount": 3
}
"""

In [45]:
get_mongodb_query(ex8_input_data, ex8_output_data)

Assistant Response:
To produce the expected output, you can use the following MongoDB query:

```mongodb
db.students.aggregate([
  {
    $project: {
      totalResultCount: { $size: "$subjects" },
      totalPassedCount: {
        $size: {
          $filter: {
            input: "$subjects",
            as: "subject",
            cond: { $eq: ["$$subject.result", "passed"] }
          }
        }
      },
      totalFailedCount: {
        $size: {
          $filter: {
            input: "$subjects",
            as: "subject",
            cond: { $eq: ["$$subject.result", "failed"] }
          }
        }
      }
    }
  }
])
```

This query uses the `$project` stage to calculate the total result count, total passed count, and total failed count. The `$size` operator is used to get the length of the `subjects` array. The `$filter` operator is used to filter the `subjects` array based on the `result` field and then the `$size` operator is used again to get the length of the filtered arra

### Example 9

In [46]:
ex9_input_data = """
[
    {
        "studentName": "Pete",
        "subjects": [
            {
                "subjectName": "Math",
                "result": "passed"
            },
            {
                "subjectName": "Physics",
                "result": "passed"
            },
            {
                "subjectName": "Chemistry",
                "result": "failed"
            },
            {
                "subjectName": "Botany",
                "result": "failed"
            },
            {
                "subjectName": "Zoology",
                "result": "failed"
            }
        ]
    },
    {
        "studentName": "Mazu",
        "subjects": [
            {
                "subjectName": "Math",
                "result": "failed"
            },
            {
                "subjectName": "English",
                "result": "passed"
            },
            {
                "subjectName": "Commerce",
                "result": "passed"
            },
            {
                "subjectName": "Biology",
                "result": "passed"
            }
        ]
    } 
]
"""

ex9_output_data = """
{
    "totalStudentsCount": 2,
    "totalResultCount": 9,
    "totalPassedCount": 5,
    "totalFailedCount": 4,
    "totalUniqueSubjectsCount": 8,
    "totalSubjectsCount": 9
}
"""

In [47]:
get_mongodb_query(ex9_input_data, ex9_output_data)

Assistant Response:
To produce the expected output, you can use the following MongoDB query:

```javascript
db.students.aggregate([
  {
    $project: {
      totalStudentsCount: 1,
      totalResultCount: { $size: "$subjects" },
      totalPassedCount: {
        $size: {
          $filter: {
            input: "$subjects",
            as: "subject",
            cond: { $eq: ["$$subject.result", "passed"] }
          }
        }
      },
      totalFailedCount: {
        $size: {
          $filter: {
            input: "$subjects",
            as: "subject",
            cond: { $eq: ["$$subject.result", "failed"] }
          }
        }
      },
      totalUniqueSubjectsCount: { $size: { $setUnion: "$subjects.subjectName" } }
    }
  },
  {
    $group: {
      _id: null,
      totalStudentsCount: { $sum: "$totalStudentsCount" },
      totalResultCount: { $sum: "$totalResultCount" },
      totalPassedCount: { $sum: "$totalPassedCount" },
      totalFailedCount: { $sum: "$totalFailedCount