# StudySphere Quality Control (QC) Module
This notebook showcases the Quality Control (QC) module for StudySphere. It includes dummy data to illustrate QC mechanisms for filtering and ranking user-generated content.

### Module Objective:
The QC module aims to ensure that only high-quality content is retained on the platform by:
- Removing low-quality or downvoted questions
- Highlighting the top-rated answers
- Ranking questions by popularity
- Banning users who consistently contribute low-quality content


## Input Data Structure
### Questions Data
The platform collects questions from users in the following JSON format:
- **Username**: Unique username of the user posting the question.
- **Question Type**: Type of question (e.g., Multiple Choice, Short Answer).
- **Question**: The text of the question.
- **QuestionId**: Unique identifier for the question.
- **Answer**: Suggested answer provided by the user.

Example:
```json
[
  {"Username": "User1", "Question Type": "Multiple Choice", "Question": "What elements are in the periodic table?", "QuestionId": 1, "Answer": "Carbon, Mitochondria, Nucleus"},
  {"Username": "User2", "Question Type": "Short Answer", "Question": "What is the mitochondria?", "QuestionId": 2, "Answer": "The powerhouse of the cell."}
]
```

### Voting Data Structure
- **Question Votes**: JSON format where each vote is linked to a question.
  - `Upvote` is `True` for upvotes and `False` otherwise.

Example:
```json
[{"Username": "User5", "QuestionId": 3, "Upvote": True}, {"Username": "User6", "QuestionId": 2, "Upvote": False}]
```

- **Answer Votes**: JSON format where users vote on a specific answer to a question.

Example:
```json
[{"Username": "User7", "QuestionId": 3, "Answer": "Quick Sort"}]
```

## Step 1: Import Libraries and Define Dummy Data
Let's start by importing necessary libraries and creating dummy data based on the data structure defined above.

In [None]:

import pandas as pd

# Sample data of questions submitted by users
questions_data = [
    {"Username": "User1", "QuestionId": 1, "QuestionType": "Multiple Choice", "Question": "What elements are in the periodic table?", "Upvotes": 3, "Downvotes": 2, "Answer": "Carbon, Oxygen, Nitrogen"},
    {"Username": "User2", "QuestionId": 2, "QuestionType": "Short Answer", "Question": "What is the mitochondria?", "Upvotes": 1, "Downvotes": 4, "Answer": "The powerhouse of the cell."},
    {"Username": "User3", "QuestionId": 3, "QuestionType": "Short Answer", "Question": "What algorithm can be used to sort a list?", "Upvotes": 5, "Downvotes": 0, "Answer": "Quick Sort"},
    {"Username": "User4", "QuestionId": 4, "QuestionType": "Short Answer", "Question": "What algorithm can be used to sort a list?", "Upvotes": 2, "Downvotes": 5, "Answer": "Merge Sort"}
]

# Convert to DataFrame
questions_df = pd.DataFrame(questions_data)

# Display sample input data for QC
questions_df


## Step 2: Implementing Quality Control Rules
### QC Rule 1: Display Top 2-3 Answers
This function ranks answers based on the number of upvotes, helping to display the top answers prominently.

### QC Rule 2: Rank Questions by Votes
Questions are sorted by upvotes to prioritize popular and high-quality questions.

### QC Rule 3: Remove Questions with Too Many Downvotes
Questions with a specified downvote threshold are flagged for removal.

### QC Rule 4: Ban Users Who Post Irrelevant Questions
Users with repeated downvoted questions are flagged for potential banning.



In [None]:

# Define QC thresholds
downvote_threshold = 3  # Threshold for flagging a question as low-quality
user_ban_threshold = 2  # Threshold for banning users with repeated low-quality content

# Step 2a: Function to determine if a question should be flagged for removal
def flag_for_removal(row):
    return row['Downvotes'] >= downvote_threshold

# Apply QC flagging for removal
questions_df['FlaggedForRemoval'] = questions_df.apply(flag_for_removal, axis=1)

# Step 2b: Aggregate user data to check for potential bans
# Counting flagged questions per user
user_flags = questions_df[questions_df['FlaggedForRemoval']].groupby('Username').size().reset_index(name='FlaggedQuestionsCount')
user_flags['BanUser'] = user_flags['FlaggedQuestionsCount'] >= user_ban_threshold

# Step 2c: Rank Questions by Votes and Display Top 2-3 Answers
questions_df['VoteScore'] = questions_df['Upvotes'] - questions_df['Downvotes']
questions_ranked = questions_df.sort_values(by='VoteScore', ascending=False)

# Displaying results
print("Questions flagged for removal:
", questions_df[['QuestionId', 'Question', 'FlaggedForRemoval']])
print("
Users flagged for potential ban:
", user_flags)
print("
Questions ranked by votes:
", questions_ranked[['QuestionId', 'Question', 'VoteScore']])


## Output Interpretation
- **Flagged Questions**: Questions with `FlaggedForRemoval = True` are identified as low-quality content for potential removal.
- **Flagged Users**: Users with `BanUser = True` have repeatedly submitted low-quality questions and may face restrictions.
- **Ranked Questions**: Questions are ranked by votes to highlight popular questions.
- **Top Answers**: Upvotes determine the prominence of answers displayed to users, with the top 2-3 answers shown by rank.

These rules and aggregations help maintain a high standard of content on StudySphere through majority voting-based quality control mechanisms.