# Data Download

Download datasets for COVID-19 ABSA and emotion analysis.

**Authors:** Marko Haralović, Onat Akca, Salih Eren Yücetürk

## COVIDSenti Dataset

90,000 COVID-19 tweets with sentiment labels (positive, negative, neutral).

**Source:** https://github.com/usmaann/COVIDSenti

In [None]:
import os
import requests
import pandas as pd
from pathlib import Path

project_root = Path.cwd().parent
data_dir = project_root / 'data' / 'input_data' / 'COVIDSenti' / 'raw'
data_dir.mkdir(parents=True, exist_ok=True)

print(f"Project root: {project_root}")
print(f"Data directory: {data_dir}")
print(f"Downloading to: {data_dir.absolute()}\n")

datasets = {
   'COVIDSenti-A': 'https://raw.githubusercontent.com/usmaann/COVIDSenti/main/COVIDSenti-A.csv',
   'COVIDSenti-B': 'https://raw.githubusercontent.com/usmaann/COVIDSenti/main/COVIDSenti-B.csv',
   'COVIDSenti-C': 'https://raw.githubusercontent.com/usmaann/COVIDSenti/main/COVIDSenti-C.csv',
   'COVIDSenti-Full': 'https://raw.githubusercontent.com/usmaann/COVIDSenti/main/COVIDSenti.csv'
}

for name, url in datasets.items():
   response = requests.get(url)
   if response.status_code == 200:
      filepath = data_dir / f'{name}.csv'
      filepath.write_bytes(response.content)

   else:
      print(f"Failed to download {name}: {response.status_code}")

Project root: /home/s3758869/synchain-absa-emotion
Data directory: /home/s3758869/synchain-absa-emotion/data/input_data/COVIDSenti
Downloading to: /home/s3758869/synchain-absa-emotion/data/input_data/COVIDSenti



## METS-CoV Dataset

Medical Entity and Targeted Sentiment on COVID-19 tweets.

**Source:** https://github.com/YLab-Open/METS-CoV

In [None]:
from pathlib import Path

project_root = Path.cwd().parent
mets_temp = project_root / 'data' / 'METS-CoV-temp'
mets_dir = project_root / 'data' / 'input_data' / 'METS-CoV' / 'raw'

!git clone https://github.com/YLab-Open/METS-CoV.git {mets_temp}

mets_dir.mkdir(parents=True, exist_ok=True)
import shutil
for file in (mets_temp / 'dataset').glob('*'):
    shutil.move(str(file), str(mets_dir / file.name))

shutil.rmtree(mets_temp)

!ls -lh {mets_dir}

Cloning into '/home/s3758869/synchain-absa-emotion/data/METS-CoV-temp'...
remote: Enumerating objects: 69, done.[K
remote: Enumerating objects: 69, done.[K
remote: Counting objects: 100% (69/69), done.[K
remote: Counting objects: 100% (69/69), done.[K
remote: Compressing objects: 100% (41/41), done.[K
remote: Compressing objects: 100% (41/41), done.[K
remote: Total 69 (delta 39), reused 50 (delta 27), pack-reused 0 (from 0)[K
Receiving objects: 100% (69/69), 7.05 MiB | 34.55 MiB/s, done.
remote: Total 69 (delta 39), reused 50 (delta 27), pack-reused 0 (from 0)[K
Receiving objects: 100% (69/69), 7.05 MiB | 34.55 MiB/s, done.
Resolving deltas: 100% (39/39), done.
Resolving deltas: 100% (39/39), done.
total 1.5K
-rw-r--r-- 1 s3758869 30019 306K Dec 12 15:48 MEST-CoV-test.csv
-rw-r--r-- 1 s3758869 30019 307K Dec 12 15:48 METS-CoV-dev.csv
-rw-r--r-- 1 s3758869 30019 1.5M Dec 12 15:48 METS-CoV-train.csv
total 1.5K
-rw-r--r-- 1 s3758869 30019 306K Dec 12 15:48 MEST-CoV-test.csv
-rw-r-

# SenWave Dataset

In [None]:

import os
import requests
import pandas as pd
from pathlib import Path

project_root = Path.cwd().parent
data_dir = project_root / 'data' / 'input_data' / 'SenWave' / 'raw'
data_dir.mkdir(parents=True, exist_ok=True)

print(f"Project root: {project_root}")
print(f"Data directory: {data_dir}")
print(f"Downloading to: {data_dir.absolute()}\n")

datasets = {
   'SenWave': 'https://raw.githubusercontent.com/gitdevqiang/SenWave/main/labeledtweets/labeledEn.csv',
}

for name, url in datasets.items():
   response = requests.get(url)
   if response.status_code == 200:
      filepath = data_dir / f'{name}.csv'
      filepath.write_bytes(response.content)
      print(f"Downloaded {name}")
   else:
      print(f"Failed to download {name}: {response.status_code}")


Project root: /home/s3758869/synchain-absa-emotion
Data directory: /home/s3758869/synchain-absa-emotion/data/input_data/SenWave
Downloading to: /home/s3758869/synchain-absa-emotion/data/input_data/SenWave

Downloaded SenWave
Downloaded SenWave


## COVID-19 NLP Text Classification (Kaggle)

-  https://www.kaggle.com/datasets/datatattle/covid-19-nlp-text-classification

- download `kaggle.json` from https://www.kaggle.com/settings/account and place it in `~/.kaggle/`

In [None]:
from pathlib import Path

project_root = Path.cwd().parent
kaggle_dir = project_root / 'data' / 'input_data' / 'COVID-19-NLP' / 'raw'
kaggle_dir.mkdir(parents=True, exist_ok=True)

!kaggle datasets download -d datatattle/covid-19-nlp-text-classification -p {kaggle_dir} --unzip

print(f"\nDataset downloaded to: {kaggle_dir}")
!ls -lh {kaggle_dir}

Dataset URL: https://www.kaggle.com/datasets/datatattle/covid-19-nlp-text-classification
License(s): copyright-authors
Downloading covid-19-nlp-text-classification.zip to /home/s3758869/synchain-absa-emotion/data/input_data/COVID-19-NLP
Downloading covid-19-nlp-text-classification.zip to /home/s3758869/synchain-absa-emotion/data/input_data/COVID-19-NLP
  0%|                                               | 0.00/4.38M [00:00<?, ?B/s]
100%|███████████████████████████████████████| 4.38M/4.38M [00:00<00:00, 199MB/s]
  0%|                                               | 0.00/4.38M [00:00<?, ?B/s]
100%|███████████████████████████████████████| 4.38M/4.38M [00:00<00:00, 199MB/s]

Dataset downloaded to: /home/s3758869/synchain-absa-emotion/data/input_data/COVID-19-NLP

Dataset downloaded to: /home/s3758869/synchain-absa-emotion/data/input_data/COVID-19-NLP
total 1.0K
-rw-r--r-- 1 s3758869 30019 979K Dec 12 16:34 Corona_NLP_test.csv
-rw-r--r-- 1 s3758869 30019  11M Dec 12 16:34 Corona_NLP_train.c