This project explores the CORD-19 dataset, which contains metadata of COVID-19 research papers. The goal is to perform a complete data science workflow, including data loading, cleaning, analysis, visualization, and building an interactive application using Streamlit.
By completing this project, I gained hands-on experience with real-world data, learning how to handle missing values, extract insights, visualize trends, and create interactive dashboards.
- File used:
metadata.csv
from the CORD-19 dataset - Source: CORD-19 Dataset
- Key columns:
title
: Title of the research paperabstract
: Abstract text of the paperpublish_time
: Publication datejournal
: Journal namesource_x
: Dataset source
- Download the dataset and place
metadata.csv
in your project folder. - Load the dataset using pandas:
import pandas as pd
df = pd.read_csv('metadata.csv', low_memory=False)