# Data Download

Download datasets for COVID-19 ABSA and emotion analysis.

**Authors:** Marko Haralović, Onat Akca, Salih Eren Yücetürk

## COVIDSenti Dataset

90,000 COVID-19 tweets with sentiment labels (positive, negative, neutral).

**Source:** https://github.com/usmaann/COVIDSenti

## METS-CoV Dataset

Medical Entity and Targeted Sentiment on COVID-19 tweets.

**Source:** https://github.com/YLab-Open/METS-CoV

In [11]:
# Clone repository
!git clone https://github.com/YLab-Open/METS-CoV.git data/METS-CoV-temp

!mkdir -p data/METS-CoV
!mv data/METS-CoV-temp/dataset/* data/METS-CoV/
!rm -rf data/METS-CoV-temp

print("Downloaded METS-CoV dataset")
!ls -lh data/METS-CoV/

Cloning into 'data/METS-CoV-temp'...
remote: Enumerating objects: 69, done.[K
remote: Enumerating objects: 69, done.[K
remote: Counting objects: 100% (69/69), done.[K
remote: Counting objects: 100% (69/69), done.[K
remote: Compressing objects: 100% (41/41), done.[K
remote: Compressing objects: 100% (41/41), done.[K
Receiving objects:  49% (34/69)

remote: Total 69 (delta 39), reused 50 (delta 27), pack-reused 0 (from 0)[K
Receiving objects: 100% (69/69), 7.05 MiB | 36.65 MiB/s, done.
Resolving deltas: 100% (39/39), done.
remote: Total 69 (delta 39), reused 50 (delta 27), pack-reused 0 (from 0)[K
Receiving objects: 100% (69/69), 7.05 MiB | 36.65 MiB/s, done.
Resolving deltas: 100% (39/39), done.
Downloaded METS-CoV dataset
total 1.5K
-rw-r--r-- 1 s3758869 30019 306K Dec 12 14:20 MEST-CoV-test.csv
-rw-r--r-- 1 s3758869 30019 307K Dec 12 14:20 METS-CoV-dev.csv
-rw-r--r-- 1 s3758869 30019 1.5M Dec 12 14:20 METS-CoV-train.csv
Downloaded METS-CoV dataset
total 1.5K
-rw-r--r-- 1 s3758869 30019 306K Dec 12 14:20 MEST-CoV-test.csv
-rw-r--r-- 1 s3758869 30019 307K Dec 12 14:20 METS-CoV-dev.csv
-rw-r--r-- 1 s3758869 30019 1.5M Dec 12 14:20 METS-CoV-train.csv


In [9]:
import os
import requests
import pandas as pd
from pathlib import Path

# COVIDSenti
# https://github.com/usmaann/COVIDSenti

project_root = Path.cwd()
data_dir = project_root / 'data' / 'COVIDSenti'
data_dir.mkdir(parents=True, exist_ok=True)

print(f"Project root: {project_root}")
print(f"Data directory: {data_dir}")
print(f"Downloading to: {data_dir.absolute()}\n")

# Download all three subsets
datasets = {
    'COVIDSenti-A': 'https://raw.githubusercontent.com/usmaann/COVIDSenti/main/COVIDSenti-A.csv',
    'COVIDSenti-B': 'https://raw.githubusercontent.com/usmaann/COVIDSenti/main/COVIDSenti-B.csv',
    'COVIDSenti-C': 'https://raw.githubusercontent.com/usmaann/COVIDSenti/main/COVIDSenti-C.csv',
    'COVIDSenti-Full': 'https://raw.githubusercontent.com/usmaann/COVIDSenti/main/COVIDSenti.csv'
}

for name, url in datasets.items():
    response = requests.get(url)
    if response.status_code == 200:
        filepath = data_dir / f'{name}.csv'
        filepath.write_bytes(response.content)

    else:
        print(f"✗ Failed to download {name}: {response.status_code}")

Project root: /home/s3758869/synchain-absa-emotion
Data directory: /home/s3758869/synchain-absa-emotion/data/COVIDSenti
Downloading to: /home/s3758869/synchain-absa-emotion/data/COVIDSenti



## Verify Downloads

In [10]:
import os

datasets = {
    'COVIDSenti': 'data/COVIDSenti/COVIDSenti.csv',
    'METS-CoV': 'data/METS-CoV'
}

for name, path in datasets.items():
    if os.path.exists(path):
        print(f"[OK] {name} downloaded")
    else:
        print(f"[MISSING] {name}")

[OK] COVIDSenti downloaded
[OK] METS-CoV downloaded
