# Generate Test Dataset

This notebook creates a smaller, more coherent testing set from `ai_human_ds.csv`:
- 60 Human samples (label = 0)
- 60 AI samples (label = 1)
- Total: 120 samples

The selected data will be saved as `gtest_data.csv`.

## Step 1: Import Libraries and Load Data

In [2]:
import pandas as pd
import numpy as np

# Set random seed for reproducibility
np.random.seed(42)

# Load the dataset
df = pd.read_csv('ai_human_ds.csv')

# Display basic information
print(f"Total samples: {len(df)}")
print(f"\nColumn names: {list(df.columns)}")
print(f"\nLabel distribution (generated column):")
print(f"  0 = Human, 1 = AI")
print(df['generated'].value_counts())
print(f"\nFirst few rows:")
df.head()

Total samples: 1460

Column names: ['text', 'generated']

Label distribution (generated column):
  0 = Human, 1 = AI
generated
0    1375
1      85
Name: count, dtype: int64

First few rows:


Unnamed: 0,text,generated
0,"Machine learning, a subset of artificial intel...",1
1,"A decision tree, a prominent machine learning ...",1
2,"Education, a cornerstone of societal progress,...",1
3,"Computers, the backbone of modern technology, ...",1
4,"Chess, a timeless game of strategy and intelle...",1


## Step 2: Sample 60 from Each Class

In [3]:
# Separate data by label (generated column: 0 = Human, 1 = AI)
human_samples = df[df['generated'] == 0]
ai_samples = df[df['generated'] == 1]

print(f"Available Human samples (generated=0): {len(human_samples)}")
print(f"Available AI samples (generated=1): {len(ai_samples)}")

# Since we only have 85 AI samples, we'll adjust our strategy
# Option 1: Use all 85 AI samples and match with 85 human samples for balance
# Option 2: Use 60 AI samples (if available) and 60 human samples as requested

if len(ai_samples) >= 60:
    # Use the requested 60 samples from each
    n_samples = 60
    human_selected = human_samples.sample(n=n_samples, random_state=42)
    ai_selected = ai_samples.sample(n=n_samples, random_state=42)
    print(f"\n‚úÖ Using {n_samples} samples from each class as requested")
else:
    # Use all available AI samples and match with same number of human samples
    n_samples = len(ai_samples)
    human_selected = human_samples.sample(n=n_samples, random_state=42)
    ai_selected = ai_samples
    print(f"\n‚ö†Ô∏è  Only {n_samples} AI samples available. Using all AI samples and {n_samples} human samples for balance.")

# Combine the selected samples
gtest_df = pd.concat([human_selected, ai_selected], ignore_index=True)

# Shuffle the combined dataset
gtest_df = gtest_df.sample(frac=1, random_state=42).reset_index(drop=True)

# Rename 'generated' column to 'label' for clarity
gtest_df = gtest_df.rename(columns={'generated': 'label'})

print(f"\n‚úÖ Selected {len(gtest_df)} samples:")
print(f"   - Human (label=0): {len(gtest_df[gtest_df['label'] == 0])}")
print(f"   - AI (label=1): {len(gtest_df[gtest_df['label'] == 1])}")
print(f"\nDataFrame shape: {gtest_df.shape}")
print(f"Columns: {list(gtest_df.columns)}")
gtest_df.head()

Available Human samples (generated=0): 1375
Available AI samples (generated=1): 85

‚úÖ Using 60 samples from each class as requested

‚úÖ Selected 120 samples:
   - Human (label=0): 60
   - AI (label=1): 60

DataFrame shape: (120, 2)
Columns: ['text', 'label']


Unnamed: 0,text,label
0,"We, the people of the United States, live in a...",0
1,"Last year in 2014, the earth had the warmest t...",0
2,"Dear state senator, The Electoral College that...",0
3,"Congestion, the amount of car traffic in a spe...",0
4,"The United States has been known for life, lib...",0


## Step 3: Inspect the Selected Data

In [4]:
# Display some statistics
print("Label distribution in test set:")
print(gtest_df['label'].value_counts())
print("\n" + "="*80)

# Show a few examples from each class
print("\nExample Human texts (label=0):")
print("="*80)
for i, row in gtest_df[gtest_df['label'] == 0].head(3).iterrows():
    print(f"\nSample {i+1}:")
    print(row['text'][:300] + "..." if len(row['text']) > 300 else row['text'])

print("\n" + "="*80)
print("\nExample AI texts (label=1):")
print("="*80)
for i, row in gtest_df[gtest_df['label'] == 1].head(3).iterrows():
    print(f"\nSample {i+1}:")
    print(row['text'][:300] + "..." if len(row['text']) > 300 else row['text'])

Label distribution in test set:
label
0    60
1    60
Name: count, dtype: int64


Example Human texts (label=0):

Sample 1:
We, the people of the United States, live in a car happy society. Every teenager can't wait until their 16th birthday because, for mostly every kid, that means that they go get their drivers license and possibly their very own car. Also, adults always look at getting a nice luxurious car and the top...

Sample 2:
Last year in 2014, the earth had the warmest temperature in recorded history. Needless to say, this is due to greenhouse gases, like carbon dioxide. One of the major reasons behind an increase in greenhouse gases is due to the use of cars and motorcycles, which release high amounts of carbon dioxide...

Sample 3:
Dear state senator, The Electoral College that was established by the founding fathers in the constitution is important to all of us. Every candidate that is running for President in each state has its own group of electors that the political par

## Step 4: Save to CSV File

In [5]:
# Save to CSV file
output_file = 'gtest_data.csv'
gtest_df.to_csv(output_file, index=False)

print(f"‚úÖ Test dataset saved to: {output_file}")
print(f"   Total samples: {len(gtest_df)}")
print(f"   Columns: {list(gtest_df.columns)}")

# Verify the saved file
import os
if os.path.exists(output_file):
    file_size = os.path.getsize(output_file) / 1024  # Size in KB
    print(f"   File size: {file_size:.2f} KB")
    print("\n‚úì File saved successfully!")

‚úÖ Test dataset saved to: gtest_data.csv
   Total samples: 120
   Columns: ['text', 'label']
   File size: 272.14 KB

‚úì File saved successfully!


## Step 5: DataFrame Available for Inspection

The `gtest_df` DataFrame is now available in memory for you to inspect and play around with. You can use any of the following commands to explore the data:

- `gtest_df.head()` - View first few rows
- `gtest_df.tail()` - View last few rows
- `gtest_df.info()` - Get DataFrame info
- `gtest_df.describe()` - Get statistical summary
- `gtest_df[gtest_df['label'] == 0]` - Filter human texts
- `gtest_df[gtest_df['label'] == 1]` - Filter AI texts
- `gtest_df.sample(5)` - Random sample of 5 rows

In [6]:
# Example: Display DataFrame info
print("DataFrame Information:")
print("="*80)
gtest_df.info()

print("\n" + "="*80)
print("\nDataFrame Summary:")
print("="*80)
print(f"Total rows: {len(gtest_df)}")
print(f"Total columns: {len(gtest_df.columns)}")
print(f"\nColumn names: {list(gtest_df.columns)}")
print(f"\nLabel counts:")
print(gtest_df['label'].value_counts().to_dict())

DataFrame Information:
<class 'pandas.DataFrame'>
RangeIndex: 120 entries, 0 to 119
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   text    120 non-null    str  
 1   label   120 non-null    int64
dtypes: int64(1), str(1)
memory usage: 2.0 KB


DataFrame Summary:
Total rows: 120
Total columns: 2

Column names: ['text', 'label']

Label counts:
{0: 60, 1: 60}


In [16]:
# Example: Display a random sample
print(gtest_df['text'][4])
print(gtest_df['label'][4])

print(len(gtest_df['text'][4].split()))

The United States has been known for life, liberty, and the pursuit of happiness, but that's not all. It's also known for the different freedoms made available and its fair government. The electoral college is a system in which states choose representatives to vote on the president. In the past, there was a big debate on whether or not this process was fair. The electoral college is not fair or trustworthy for many reasons.

Imagine that you picked a representative who said they were going to vote for the person you wanted for president. Sadly, they ended up changing their mind. You could end up with a president you don't like or believe in. Voters don't have total control over who their electors vote for. To me, that doesn't sound very fair. If everyone were allowed to vote, the people would be able to ensure that there vote counted towards the person they wanted, and not towards the candidate they were against. Based on multiple polls, a few presidents have won the popular vote, but 

---

# Part 2: Generate AI-Mimicked Human Text using Gemini API

This section generates 60 paragraphs of AI text that mimics human writing:
- 30 informal/semi-formal tone paragraphs
- 30 formal tone paragraphs
- Using 6 different topics
- Each prompt generates ~1000-1200 words, split into 6 paragraphs of 100-200 words

## Step 6: Setup Gemini API

In [26]:
from google import genai
import time
import re

# Configure Gemini API
# Replace 'YOUR_API_KEY_HERE' with your actual API key
API_KEY = 'AIzaSyD8gDwI4qO9A0oigT2nVNG8A-f8_MdwVV8'

# Initialize the client
client = genai.Client(api_key=API_KEY)

# Model to use
MODEL_ID = "gemini-2.5-flash"

print("‚úÖ Gemini API configured successfully!")
print(f"Model: {MODEL_ID}")

‚úÖ Gemini API configured successfully!
Model: gemini-2.5-flash


## Step 7: Define Topics and Prompts

In [20]:
# Define the 6 topics
TOPICS = [
    "The role of technology in shaping everyday decision-making",
    "How people form and change deeply held beliefs",
    "The trade-off between convenience and privacy in modern life",
    "What motivates people to pursue long-term goals",
    "The impact of social environments on individual behavior",
    "How uncertainty influences judgment and choice"
]

# Informal/Semi-formal prompt template
INFORMAL_PROMPT_TEMPLATE = """Write a 1000‚Äì1200 word paragraph on {topic} as a human might write it in an informal or semi-formal setting.
Use natural variation in sentence length and structure.
Allow mild repetition or digression.
Avoid overly polished or academic phrasing.
Do not follow a rigid structure or outline.
Write as if expressing thoughts naturally rather than optimizing clarity. Return just the paragraph in clean text no extras."""

# Formal prompt template
FORMAL_PROMPT_TEMPLATE = """Write a 1000‚Äì1200 word paragraph on {topic} in a formal but natural human writing style.
Use clear sentences and a coherent line of reasoning, but avoid rigid structure or textbook phrasing.
Allow mild redundancy or qualification where appropriate.
Do not over-optimize clarity or polish.
Write as if explaining your thoughts carefully rather than presenting a finished essay. Return just the paragraph in clean text."""

print("‚úÖ Topics and prompt templates defined")
print(f"\nTopics ({len(TOPICS)}):")
for i, topic in enumerate(TOPICS, 1):
    print(f"  {i}. {topic}")

‚úÖ Topics and prompt templates defined

Topics (6):
  1. The role of technology in shaping everyday decision-making
  2. How people form and change deeply held beliefs
  3. The trade-off between convenience and privacy in modern life
  4. What motivates people to pursue long-term goals
  5. The impact of social environments on individual behavior
  6. How uncertainty influences judgment and choice


## Step 8: Helper Functions

In [24]:
def split_into_paragraphs(text, min_words=100, max_words=200):
    """
    Split text into paragraphs of 100-200 words at sentence boundaries.
    Uses spaCy for sentence segmentation.
    """
    import spacy
    nlp = spacy.load('en_core_web_sm')
    
    doc = nlp(text)
    sentences = [sent.text.strip() for sent in doc.sents]
    
    paragraphs = []
    current_paragraph = []
    current_word_count = 0
    
    for sentence in sentences:
        sentence_words = len(sentence.split())
        
        # If adding this sentence would exceed max_words and we already have min_words
        if current_word_count + sentence_words > max_words and current_word_count >= min_words:
            paragraphs.append(' '.join(current_paragraph))
            current_paragraph = [sentence]
            current_word_count = sentence_words
        else:
            current_paragraph.append(sentence)
            current_word_count += sentence_words
    
    # Add remaining paragraph if it meets minimum requirement
    if current_word_count >= min_words:
        paragraphs.append(' '.join(current_paragraph))
    
    return paragraphs


def generate_text_from_gemini(prompt, max_retries=3):
    """
    Generate text from Gemini API with retry logic.
    """
    for attempt in range(max_retries):
        try:
            response = client.models.generate_content(
                model=MODEL_ID,
                contents=prompt
            )
            return response.text
        except Exception as e:
            print(f"  ‚ö†Ô∏è  Attempt {attempt + 1} failed: {str(e)}")
            if attempt < max_retries - 1:
                wait_time = 2 ** attempt  # Exponential backoff
                print(f"  ‚è≥ Waiting {wait_time} seconds before retry...")
                time.sleep(wait_time)
            else:
                print(f"  ‚ùå Failed after {max_retries} attempts")
                return None
    return None


print("‚úÖ Helper functions defined:")
print("  - split_into_paragraphs(): Split text into 100-200 word chunks")
print("  - generate_text_from_gemini(): Call Gemini API with retry logic")

‚úÖ Helper functions defined:
  - split_into_paragraphs(): Split text into 100-200 word chunks
  - generate_text_from_gemini(): Call Gemini API with retry logic


## Step 9: Generate Informal/Semi-formal Paragraphs (30 total)

In [27]:
# Generate 30 informal paragraphs (5 iterations √ó 6 paragraphs each)
informal_paragraphs = []
informal_full_texts = []  # Store full responses for inspection

print("Generating 30 informal/semi-formal paragraphs...")
print("=" * 80)

# We need 5 iterations to get 30 paragraphs (5 √ó 6 = 30)
for iteration in range(5):
    topic = TOPICS[iteration % len(TOPICS)]  # Cycle through topics
    
    print(f"\nüìù Iteration {iteration + 1}/5")
    print(f"   Topic: {topic}")
    
    # Create prompt
    prompt = INFORMAL_PROMPT_TEMPLATE.format(topic=topic)
    
    # Generate text
    print("   ‚è≥ Calling Gemini API...")
    generated_text = generate_text_from_gemini(prompt)
    
    if generated_text:
        # Store full text
        informal_full_texts.append({
            'iteration': iteration + 1,
            'topic': topic,
            'formality': 'informal',
            'full_text': generated_text,
            'word_count': len(generated_text.split())
        })
        
        # Split into paragraphs
        paragraphs = split_into_paragraphs(generated_text)
        
        # Take first 6 paragraphs (or all if less than 6)
        selected_paragraphs = paragraphs[:6]
        informal_paragraphs.extend(selected_paragraphs)
        
        print(f"   ‚úÖ Generated {len(paragraphs)} paragraphs, selected {len(selected_paragraphs)}")
        print(f"   üìä Total informal paragraphs so far: {len(informal_paragraphs)}")
    else:
        print(f"   ‚ùå Failed to generate text for iteration {iteration + 1}")
    
    # Wait between API calls to avoid rate limits
    if iteration < 4:  # Don't wait after the last iteration
        print("   ‚è≥ Waiting 2 seconds before next API call...")
        time.sleep(2)

print("\n" + "=" * 80)
print(f"‚úÖ Informal paragraph generation complete!")
print(f"   Total paragraphs generated: {len(informal_paragraphs)}")

Generating 30 informal/semi-formal paragraphs...

üìù Iteration 1/5
   Topic: The role of technology in shaping everyday decision-making
   ‚è≥ Calling Gemini API...
   ‚úÖ Generated 6 paragraphs, selected 6
   üìä Total informal paragraphs so far: 6
   ‚è≥ Waiting 2 seconds before next API call...

üìù Iteration 2/5
   Topic: How people form and change deeply held beliefs
   ‚è≥ Calling Gemini API...
   ‚úÖ Generated 5 paragraphs, selected 5
   üìä Total informal paragraphs so far: 11
   ‚è≥ Waiting 2 seconds before next API call...

üìù Iteration 3/5
   Topic: The trade-off between convenience and privacy in modern life
   ‚è≥ Calling Gemini API...
   ‚úÖ Generated 6 paragraphs, selected 6
   üìä Total informal paragraphs so far: 17
   ‚è≥ Waiting 2 seconds before next API call...

üìù Iteration 4/5
   Topic: What motivates people to pursue long-term goals
   ‚è≥ Calling Gemini API...
   ‚úÖ Generated 6 paragraphs, selected 6
   üìä Total informal paragraphs so far: 23
   ‚è≥

## Step 10: Generate Formal Paragraphs (30 total)

In [28]:
# Generate 30 formal paragraphs (5 iterations √ó 6 paragraphs each)
formal_paragraphs = []
formal_full_texts = []  # Store full responses for inspection

print("Generating 30 formal paragraphs...")
print("=" * 80)

# We need 5 iterations to get 30 paragraphs (5 √ó 6 = 30)
for iteration in range(5):
    topic = TOPICS[iteration % len(TOPICS)]  # Cycle through topics
    
    print(f"\nüìù Iteration {iteration + 1}/5")
    print(f"   Topic: {topic}")
    
    # Create prompt
    prompt = FORMAL_PROMPT_TEMPLATE.format(topic=topic)
    
    # Generate text
    print("   ‚è≥ Calling Gemini API...")
    generated_text = generate_text_from_gemini(prompt)
    
    if generated_text:
        # Store full text
        formal_full_texts.append({
            'iteration': iteration + 1,
            'topic': topic,
            'formality': 'formal',
            'full_text': generated_text,
            'word_count': len(generated_text.split())
        })
        
        # Split into paragraphs
        paragraphs = split_into_paragraphs(generated_text)
        
        # Take first 6 paragraphs (or all if less than 6)
        selected_paragraphs = paragraphs[:6]
        formal_paragraphs.extend(selected_paragraphs)
        
        print(f"   ‚úÖ Generated {len(paragraphs)} paragraphs, selected {len(selected_paragraphs)}")
        print(f"   üìä Total formal paragraphs so far: {len(formal_paragraphs)}")
    else:
        print(f"   ‚ùå Failed to generate text for iteration {iteration + 1}")
    
    # Wait between API calls to avoid rate limits
    if iteration < 4:  # Don't wait after the last iteration
        print("   ‚è≥ Waiting 2 seconds before next API call...")
        time.sleep(2)

print("\n" + "=" * 80)
print(f"‚úÖ Formal paragraph generation complete!")
print(f"   Total paragraphs generated: {len(formal_paragraphs)}")

Generating 30 formal paragraphs...

üìù Iteration 1/5
   Topic: The role of technology in shaping everyday decision-making
   ‚è≥ Calling Gemini API...
   ‚úÖ Generated 6 paragraphs, selected 6
   üìä Total formal paragraphs so far: 6
   ‚è≥ Waiting 2 seconds before next API call...

üìù Iteration 2/5
   Topic: How people form and change deeply held beliefs
   ‚è≥ Calling Gemini API...
   ‚úÖ Generated 5 paragraphs, selected 5
   üìä Total formal paragraphs so far: 11
   ‚è≥ Waiting 2 seconds before next API call...

üìù Iteration 3/5
   Topic: The trade-off between convenience and privacy in modern life
   ‚è≥ Calling Gemini API...
   ‚úÖ Generated 6 paragraphs, selected 6
   üìä Total formal paragraphs so far: 17
   ‚è≥ Waiting 2 seconds before next API call...

üìù Iteration 4/5
   Topic: What motivates people to pursue long-term goals
   ‚è≥ Calling Gemini API...
   ‚úÖ Generated 3 paragraphs, selected 3
   üìä Total formal paragraphs so far: 20
   ‚è≥ Waiting 2 seconds bef

## Step 11: Combine All Paragraphs and Create DataFrame

In [29]:
# Combine all AI-mimicked paragraphs
all_ai_mimic_paragraphs = informal_paragraphs + formal_paragraphs

print(f"Total AI-mimicked paragraphs: {len(all_ai_mimic_paragraphs)}")
print(f"  - Informal: {len(informal_paragraphs)}")
print(f"  - Formal: {len(formal_paragraphs)}")

# If we have more than 60, take the first 60
if len(all_ai_mimic_paragraphs) > 60:
    print(f"\n‚ö†Ô∏è  Generated {len(all_ai_mimic_paragraphs)} paragraphs. Taking first 60.")
    all_ai_mimic_paragraphs = all_ai_mimic_paragraphs[:60]
elif len(all_ai_mimic_paragraphs) < 60:
    print(f"\n‚ö†Ô∏è  Only generated {len(all_ai_mimic_paragraphs)} paragraphs. Need {60 - len(all_ai_mimic_paragraphs)} more.")

# Create DataFrame for AI-mimicked text with label=2
ai_mimic_df = pd.DataFrame({
    'text': all_ai_mimic_paragraphs,
    'label': [2] * len(all_ai_mimic_paragraphs)  # label=2 for AI-mimicked text
})

print(f"\n‚úÖ Created AI-mimicked DataFrame:")
print(f"   Shape: {ai_mimic_df.shape}")
print(f"   Columns: {list(ai_mimic_df.columns)}")
print(f"\nSample rows:")
ai_mimic_df.head(3)

Total AI-mimicked paragraphs: 54
  - Informal: 28
  - Formal: 26

‚ö†Ô∏è  Only generated 54 paragraphs. Need 6 more.

‚úÖ Created AI-mimicked DataFrame:
   Shape: (54, 2)
   Columns: ['text', 'label']

Sample rows:


Unnamed: 0,text,label
0,"It's actually wild, isn't it, when you stop fo...",2
1,Remember when you‚Äôd just pick a place you knew...,2
2,"On one hand, technology streamlines things; ou...",2


## Step 12: Merge with Existing Data and Save

In [30]:
# Combine with existing gtest_df (which has 60 human + 60 AI samples)
print("Current gtest_df:")
print(f"  Shape: {gtest_df.shape}")
print(f"  Label distribution:")
print(gtest_df['label'].value_counts())

# Append AI-mimicked data
gtest_df_final = pd.concat([gtest_df, ai_mimic_df], ignore_index=True)

# Shuffle the final dataset
gtest_df_final = gtest_df_final.sample(frac=1, random_state=42).reset_index(drop=True)

print(f"\n‚úÖ Final combined dataset:")
print(f"  Shape: {gtest_df_final.shape}")
print(f"  Label distribution:")
print(gtest_df_final['label'].value_counts().sort_index())
print(f"\n  Label key:")
print(f"    0 = Human")
print(f"    1 = AI")
print(f"    2 = AI-mimicked Human")

# Save to CSV
output_file = 'gtest_data.csv'
gtest_df_final.to_csv(output_file, index=False)

print(f"\n‚úÖ Saved to: {output_file}")
print(f"   Total samples: {len(gtest_df_final)}")

# Update the gtest_df variable to point to the final version
gtest_df = gtest_df_final

print(f"\n‚úÖ gtest_df updated with all 3 classes and available for inspection!")

Current gtest_df:
  Shape: (120, 2)
  Label distribution:
label
0    60
1    60
Name: count, dtype: int64

‚úÖ Final combined dataset:
  Shape: (174, 2)
  Label distribution:
label
0    60
1    60
2    54
Name: count, dtype: int64

  Label key:
    0 = Human
    1 = AI
    2 = AI-mimicked Human

‚úÖ Saved to: gtest_data.csv
   Total samples: 174

‚úÖ gtest_df updated with all 3 classes and available for inspection!


## Step 13: Inspect Final Dataset

In [31]:
# Display summary statistics
print("Final Dataset Summary")
print("=" * 80)
print(f"Total samples: {len(gtest_df)}")
print(f"\nLabel distribution:")
for label, count in gtest_df['label'].value_counts().sort_index().items():
    label_name = {0: 'Human', 1: 'AI', 2: 'AI-mimicked'}[label]
    print(f"  {label} ({label_name:15s}): {count} samples")

print(f"\nDataFrame info:")
gtest_df.info()

print("\n" + "=" * 80)
print("Sample from each class:")
print("=" * 80)

for label in [0, 1, 2]:
    label_name = {0: 'Human', 1: 'AI', 2: 'AI-mimicked'}[label]
    sample = gtest_df[gtest_df['label'] == label].iloc[0]
    print(f"\n{label_name} (label={label}):")
    print("-" * 80)
    print(sample['text'][:400] + "..." if len(sample['text']) > 400 else sample['text'])
    print(f"\nWord count: {len(sample['text'].split())}")

print("\n" + "=" * 80)
gtest_df.head(10)

Final Dataset Summary
Total samples: 174

Label distribution:
  0 (Human          ): 60 samples
  1 (AI             ): 60 samples
  2 (AI-mimicked    ): 54 samples

DataFrame info:
<class 'pandas.DataFrame'>
RangeIndex: 174 entries, 0 to 173
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   text    174 non-null    str  
 1   label   174 non-null    int64
dtypes: int64(1), str(1)
memory usage: 2.8 KB

Sample from each class:

Human (label=0):
--------------------------------------------------------------------------------
When were voting for president were not technically voting for the president in fact we are voting for the slate of electors. The electors can be anyone without a public holding office. Electoral college process is not a good process for presidency. While a president can get the majority of the popular vote on the other hand, he could have the minority of the electoral college votes. That presiden...

Word count:

Unnamed: 0,text,label
0,We naturally gravitate towards information and...,2
1,"And then you move out of that initial, super-i...",2
2,When were voting for president were not techni...,0
3,"And then, as we get older, these foundational ...",2
4,It‚Äôs like putting money in a mental bank accou...,2
5,It might be a vision of a future self‚Äîa health...,2
6,"""Plastic water bottles, a ubiquitous sight in ...",1
7,"Limiting car use causes pollution, increases c...",1
8,"It's really something, isn't it, how we come t...",2
9,"Dear Senator, I know that you have many issues...",0
