# Covid Fact Check Claims Analysis
The notebook analyzes the claims data for the Covid topic. The dataset is downloaded from the Google Fact Check Tools API using "covid" as the search query. The scraping was performed on 2025-02-18.

In [5]:
import pandas as pd
import json

def load_claims_from_file(filename='../data/covid.json'):
    with open(filename, 'r') as json_file:
        return json.load(json_file)

def convert_claims_to_dataframe(claims):
    # Normalize the claimReview field to flatten the structure
    claims_data = []
    for claim in claims:
        for review in claim.get('claimReview', []):
            claims_data.append({
                'text': claim.get('text'),
                'claimant': claim.get('claimant'),
                'claimDate': pd.to_datetime(claim.get('claimDate')),
                'publisher': review.get('publisher', {}).get('name'),
                'reviewUrl': review.get('url'),
                'reviewTitle': review.get('title'), 
                'reviewDate': pd.to_datetime(review.get('reviewDate')),
                'textualRating': review.get('textualRating')
            })
    return pd.DataFrame(claims_data)

In [6]:
claims = load_claims_from_file()
df = convert_claims_to_dataframe(claims)

display(df.head(10))

Unnamed: 0,text,claimant,claimDate,publisher,reviewUrl,reviewTitle,reviewDate,textualRating
0,Covid-19 does not exist.,instagram user,2025-02-10 00:00:00+00:00,Full Fact,https://fullfact.org/health/covid-19-exists-lo...,Covid-19 does exist and germs can cause diseas...,2025-02-14 00:00:00+00:00,False. Covid-19 is a real disease caused by th...
1,2021 video of corpses floating in Ganga during...,PTI Fact Check,2025-02-13 00:02:32+00:00,Press Trust of India,https://www.ptinews.com/fact-detail/pti-fact-c...,PTI Fact Check: 2021 video of corpses floating...,NaT,Misleading
2,Podcaster Ranveer Allahbadia’s 3-year-old vide...,PTI Fact Check,2025-02-13 21:02:07+00:00,Press Trust of India,https://www.ptinews.com/fact-detail/pti-fact-c...,PTI Fact Check: Podcaster Ranveer Allahbadia’s...,NaT,Misleading
3,Pfizer releases new list of its COVID-19 vacci...,Pshegs,2025-02-07 00:00:00+00:00,FactCheckHub,https://factcheckhub.com/no-pfizer-did-not-rel...,"No, Pfizer did not release list of its COVID-1...",2025-02-09 00:00:00+00:00,FALSE - The claim that Pfizer releases new lis...
4,“We brought that petition after CDC recommende...,Robert F. Kennedy Jr.,2025-01-30 00:00:00+00:00,FactCheck.org,https://www.factcheck.org/2025/02/factchecking...,FactChecking RFK Jr.’s Other Health Claims Dur...,2025-02-06 00:00:00+00:00,False
5,“Japan sounds alarm as heart failure surges am...,"Social media users, Frank Bergman",2025-01-18 00:00:00+00:00,Science Feedback,https://science.feedback.org/review/japanese-s...,Japanese study misrepresented in posts claimin...,2025-01-18 00:00:00+00:00,Inaccurate
6,president donald trump signed executive order ...,x.com,2025-02-05 21:16:38+00:00,Lead Stories,https://leadstories.com/hoax-alert/2025/02/fac...,Fact Check: Trump Did NOT Sign Executive Order...,2025-02-05 21:16:38+00:00,Out Of Context
7,Everyone who received at least one dose of an ...,Dolores Cahill,2025-01-25 00:00:00+00:00,Science Feedback,https://science.feedback.org/review/no-evidenc...,No evidence of mass deaths among vaccinated pe...,2025-01-25 00:00:00+00:00,Inaccurate
8,A Facebook post claims to contain the list of ...,Isaiah Metters,2025-02-05 02:00:00+00:00,Rappler,https://www.rappler.com/newsbreak/fact-check/p...,FACT CHECK: Post on alleged Pfizer COVID-19 va...,2025-02-05 02:00:00+00:00,False
9,"Pfizer listed hMPV as a ""side effect"" of its C...",Social media posts.,2025-01-09 00:00:00+00:00,Australian Associated Press,https://aap.com.au/factcheck/no-pfizer-did-not...,"No, Pfizer did not list hMPV as a 'side effect...",2025-02-07 00:52:18+00:00,"False. The virus was listed as an ""adverse eve..."


In [7]:
# Calculate and display the number of null values in each column
null_counts = df.isnull().sum()
print("\nNumber of null values in each column:")
print(null_counts)

# Calculate percentage of null values
null_percentages = (df.isnull().sum() / len(df)) * 100
print("\nPercentage of null values in each column:")
print(null_percentages.round(2))

# Display total number of rows
total_rows = len(df)
print(f"\nTotal number of rows: {total_rows}")



Number of null values in each column:
text               0
claimant         466
claimDate        533
publisher        222
reviewUrl          0
reviewTitle        0
reviewDate       680
textualRating      0
dtype: int64

Percentage of null values in each column:
text              0.00
claimant         13.26
claimDate        15.16
publisher         6.32
reviewUrl         0.00
reviewTitle       0.00
reviewDate       19.35
textualRating     0.00
dtype: float64

Total number of rows: 3515


# Time Analysis

In [8]:
# Get date ranges for claim dates and review dates
claim_date_range = {
    'earliest': pd.to_datetime(df['claimDate'].min()),
    'latest': pd.to_datetime(df['claimDate'].max())
}

review_date_range = {
    'earliest': pd.to_datetime(df['reviewDate'].min()),
    'latest': pd.to_datetime(df['reviewDate'].max())
}

print("\nClaim Date Range:")
print(f"Earliest: {claim_date_range['earliest']}")
print(f"Latest: {claim_date_range['latest']}")

print("\nReview Date Range:")
print(f"Earliest: {review_date_range['earliest']}")
print(f"Latest: {review_date_range['latest']}")



Claim Date Range:
Earliest: 2016-06-20 00:00:00+00:00
Latest: 2025-08-01 00:00:00+00:00

Review Date Range:
Earliest: 2020-01-22 00:00:00+00:00
Latest: 2025-02-14 00:00:00+00:00


# Publisher Analysis

In [12]:
# Create a new DataFrame for publishers and their claim counts
publisher_counts = df['publisher'].value_counts().reset_index()
publisher_counts.columns = ['Publisher', 'Number of Claims']

# Count the number of reviews with None publisher
none_publisher_count = df['publisher'].isnull().sum()
print(f"Number of reviews with None publisher: {none_publisher_count}")

# Display the top 15 publishers
top_publishers = publisher_counts.head(15)
display(top_publishers)

Number of reviews with None publisher: 222


Unnamed: 0,Publisher,Number of Claims
0,Full Fact,373
1,PolitiFact,300
2,Snopes,269
3,BOOM Fact Check,227
4,AFP Fact Check,216
5,AP News,182
6,FactCheck.org,181
7,Lead Stories,169
8,USA Today,161
9,Science Feedback,136
