This project explores the CORD-19 dataset, which contains metadata of COVID-19 research papers. The goal is to perform a complete data science workflow, including data loading, cleaning, analysis, visualization, and building an interactive application using Streamlit.
By completing this project, I gained hands-on experience with real-world data, learning how to handle missing values, extract insights, visualize trends, and create interactive dashboards.
- File used:
metadata.csvfrom the CORD-19 dataset - Source: CORD-19 Dataset
- Key columns:
title: Title of the research paperabstract: Abstract text of the paperpublish_time: Publication datejournal: Journal namesource_x: Dataset source
- Download the dataset and place
metadata.csvin your project folder. - Load the dataset using pandas:
import pandas as pd
df = pd.read_csv('metadata.csv', low_memory=False)