As a data scientist focused on building and testing sentiment analysis pipelines, creating synthetic datasets with varied sentiments is a crucial step. This Google Colab notebook will guide you through generating 200 rows of such data and saving it as a CSV file.

### Google Colab Notebook: Synthetic Course Evaluation Data Generation

This notebook generates a synthetic dataset of course evaluation responses, including `student_id`, `course_id`, and `open_text_response` with varied sentiments, suitable for text analytics and sentiment analysis testing.

-----

### **Step 1: Install Necessary Libraries**

First, we need to install the `Faker` library, which is excellent for generating realistic-looking fake data like IDs. `pandas` is typically pre-installed in Google Colab, but we'll include it for completeness.

In [1]:
# Install Faker library
!pip install Faker pandas

Collecting Faker
  Downloading faker-37.4.0-py3-none-any.whl.metadata (15 kB)
Downloading faker-37.4.0-py3-none-any.whl (1.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m26.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: Faker
Successfully installed Faker-37.4.0


-----

### **Step 2: Import Libraries**

Next, we import the required libraries: `pandas` for data manipulation, `random` for choosing random elements, and `Faker` for generating unique IDs.

In [2]:
import pandas as pd
import random
from faker import Faker

-----

### **Step 3: Initialize Faker and Define Sentiments**

We initialize the `Faker` object to generate unique student and course IDs. We also define a comprehensive list of open-text responses, carefully crafted to represent a wide range of sentiments, from very positive to very negative, and neutral.

In [3]:
# Initialize Faker for generating student and course IDs
fake = Faker()

# Define a variety of open-text responses reflecting different sentiments
sentiments = [
    "This course was incredibly insightful and well-structured. I learned a great deal!",
    "The instructor was excellent and very engaging. Highly recommend this class.",
    "I found the material somewhat dry, but the assignments were helpful.",
    "This course exceeded my expectations! The content was relevant and delivery was superb.",
    "A bit challenging at times, but ultimately rewarding. Good support from the TA.",
    "The lectures were often hard to follow, and I felt lost more than once.",
    "Definitely one of the best courses I've taken. Everything was clearly explained.",
    "I wish there was more interaction in class. It felt very lecture-heavy.",
    "The workload was manageable, and the topics were interesting.",
    "Not very organized. I struggled to keep track of deadlines and expectations.",
    "Fantastic course! I particularly enjoyed the practical exercises.",
    "The readings were extensive, but they provided a solid foundation.",
    "I felt the pace was too fast, making it difficult to fully grasp concepts.",
    "Good overall, but I think some topics could have been explored in more depth.",
    "This course completely changed my perspective on the subject. Amazing!",
    "The instructor was approachable and always willing to help.",
    "Some of the assignments felt disconnected from the lecture material.",
    "I would recommend this course to anyone interested in the topic. Very enriching.",
    "Honestly, I was quite bored throughout the course. Needs more engaging content.",
    "The group projects were a great way to apply what we learned.",
    "The online platform was clunky and made it hard to access resources.",
    "The feedback on assignments was constructive and helped me improve.",
    "Could use more real-world examples to make the theory more relatable.",
    "Overall a positive experience, although a bit challenging.",
    "The course content was outdated in some areas.",
    "Excellent teaching, clear objectives, and fair assessments.",
    "I struggled with the technical aspects; more hands-on help would have been appreciated.",
    "Learned a lot and felt well-supported by the teaching staff.",
    "The class discussions were lively and added a lot to my understanding.",
    "Too much reliance on self-study; I prefer more direct instruction.",
    "One of my favorite courses so far!",
    "Average course, nothing particularly stood out.",
    "Very disappointing. I expected more from this course.",
    "Highly relevant to my career goals.",
    "The grading criteria were unclear at times.",
    "A solid foundation, but could be more advanced.",
    "Instructor was passionate, but the material was dry.",
    "I enjoyed the guest speakers.",
    "The course website was well-organized.",
    "Needed more opportunities for feedback.",
    "It was okay. Not bad, not great.",
    "This course was a waste of time.",
    "Challenging but rewarding.",
    "Too much busy work.",
    "Very practical and useful.",
    "The instructor was a bit unapproachable.",
    "I learned so much!",
    "Confusing and disorganized.",
    "Highly recommended for anyone in the field.",
    "The lectures were engaging and easy to follow."
]

-----

### **Step 4: Generate Data**

Now, we'll generate 200 rows of data. For each row, we'll create a unique `student_id`, a `course_id` (combining a random department prefix with a unique number), and a randomly selected `open_text_response` from our predefined list.

In [4]:
# Generate 200 rows of data
data = []
for _ in range(200):
    student_id = fake.unique.random_number(digits=6)
    course_id = f"{random.choice(['CS', 'MA', 'EN', 'HI', 'PY'])}{fake.unique.random_number(digits=3)}"
    response = random.choice(sentiments)
    data.append([student_id, course_id, response])

-----

### **Step 5: Create Pandas DataFrame**

After generating the raw data, we convert it into a Pandas DataFrame for easy manipulation and saving.

In [5]:
# Create a Pandas DataFrame
df = pd.DataFrame(data, columns=['student_id', 'course_id', 'open_text_response'])

# Display the first few rows of the DataFrame to verify
print("First 5 rows of the generated DataFrame:")
print(df.head())
print(f"\nTotal rows generated: {len(df)}")

First 5 rows of the generated DataFrame:
   student_id course_id                                 open_text_response
0      165567     MA104                 One of my favorite courses so far!
1      814981     EN800  The class discussions were lively and added a ...
2      262868     PY638                         Challenging but rewarding.
3      605941     EN266  A bit challenging at times, but ultimately rew...
4      708665     HI147  This course was incredibly insightful and well...

Total rows generated: 200


-----

### **Step 6: Save Data to CSV File**

Finally, we save the generated DataFrame as a CSV file. In Google Colab, this file will be saved to the temporary file system. You can then download it directly or mount your Google Drive to save it there permanently.

In [6]:
# Define the output filename
output_filename = 'course_evaluations.csv'

# Save the DataFrame to a CSV file
df.to_csv(output_filename, index=False)

print(f"\nSuccessfully generated 200 rows of synthetic course evaluation data and saved to '{output_filename}'")

# To download the file to your local machine (optional)
from google.colab import files
files.download(output_filename)

# To save to Google Drive (optional - uncomment and run if you want to save to Drive)
# from google.colab import drive
# drive.mount('/content/drive')
# drive_path = f'/content/drive/My Drive/{output_filename}'
# df.to_csv(drive_path, index=False)
# print(f"\nAlso saved to Google Drive at: '{drive_path}'")


Successfully generated 200 rows of synthetic course evaluation data and saved to 'course_evaluations.csv'


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

-----