# Data Exploration and Cleanup Process

## Initial Research

---

Initial search for top NFT collections across the market

* OpenSea API - Could not get a key/Used Open Sea's website for manual data of Top NFT Collections
* OpenSea TestNet - Did not have collections of interest
* [Rarify API](https://docs.rarify.tech/reference) - Free API keys, limits un-impactful/Interfaces with OpenSea
* NonFungible.com - Data only provided as CSV files
* Nansen - API key required subscription/Used to manually compare NFT baskets to Nansen's NFT indexes
* Twitter API

## Datasets Used

---

Based on data restrictions and file formats, we decided to only use Rarify's API to collect data for the following reasons:

1. Real-time Data
2. Interfaced with OpenSea
3. Free to use

## Database Construction

---

1. Install the Python to Postgres Database Driver:

```python
pip install psycopg2
```

2. Two Python files in the database directory (ddl.py and dml.py), these scripts create the database tables as well as the default system data used within the application

3. Advantages to storing the data locally:

* Faster to access data stored locally
* Did not require calls to the API every time we wanted to work with the data leading to more analysis conducted
* Entity Relationship Diagram shows the relationships between collections & tokens, traits, and trade transactions on the blockchain for each collection
* Store all the data from different sources in one location using the same request method through the contract addresses for each NFT collection

SHOW ERD FILE



### Rarify API provides the following data:

* API call request returns top 100 collections based on all time volume traded
* API call request also returns top 100 tokens for each collection
* API call request returns traits for each collection and their rarity score
* API call request returns trade transactions

### Extract Transform and Load Process

etl.py runs nightly to extract data from the rarify API and stores the data into an AWS Postgres Cloud Database


### Converting API calls to SQL Queries

SQL queries are used to extract data for analysis


### Setup and Installation

1. Install the Python to Postgres Database Driver:

```python
pip install psycopg2
```

2. Install SQLAlchemy:

```python
pip install sqlalchemy
```

