Skip to content

Cornell Financial Data Collection leverages Python, Selenium, and NLP to aggregate and analyze financial data from Cornell's corporate donors, offering a unique exploration of data collection and analysis techniques.

Notifications You must be signed in to change notification settings

pratyush1712/data-engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cornell Innovation and Entrepreneurship - Data Analysis Platform

Centralized data analysis platform for the Cornell Innovation and Entrepreneurship Lab. This repository contains scripts for data collection, data cleaning, and data analysis.

Getting Started

Prerequisites

  • Python 3.9
  • pip
  • virtualenv
  • Cornell Email

Installation

  1. Clone the repository
git clone
  1. Create a virtual environment
virtualenv venv
  1. Activate the virtual environment
source venv/bin/activate
  1. CD into the server repository
cd server
  1. Install the dependencies
pip install -r requirements.txt
  1. Create a .env file in the server directory
touch .env
  1. Add the following environment variables to the .env file
export CORNELL_NETID = "your_cornell_netid"
export CORNELL_PASSWORD = "your_cornell_password"
export CAPITAL_IQ_USERNAME = "your_capital_iq_username"
export CAPITAL_IQ_PASSWORD = "your_capital_iq_password"
  1. Source the .env file
source .env
  1. Run the server
python app.py
  1. Open a new terminal window and CD into the client repository
cd cornell-data
  1. Install the dependencies
npm install
  1. Run the client
npm start

Usage

The platform could be used to collect companies data in the following ways:

  1. Collecting data of list of companies from Capital IQ, Mergent Intellect, or Guidestar websites, individually.
cd scraping
python index.py --source
  1. Collecting data of list of companies from Capital IQ, Mergent Intellect, or Guidestar websites, in bulk.
python index.py

About

Cornell Financial Data Collection leverages Python, Selenium, and NLP to aggregate and analyze financial data from Cornell's corporate donors, offering a unique exploration of data collection and analysis techniques.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages