# Github Data Collection

Using the [GitHub API](https://docs.github.com/pt/rest), extract the data from repositories, that are _user profile_,  _repositories_,  _commits_, and _languages_. In the end, it'll be saved into a  __.csv__ file.

Before running this notebook make sure that the file __config.ini__ file has the correct configuration. More information on how to configure and run this one can be found at [README.md](https://github.com/gsdenys/git-stat/blob/main/README.md) file.

## System Preparation 


### Requiremente Installation

Import required libraries, it depends on library installation. So, before running this notebook is recommended to run the command below inside this project home directory in a system terminal.

```bash
pip install -r requirements.txt
```

once all dependencies installed, the system is ready to run the [github's](http://github.com) data extraction.

### Import Library

To perform efficient data extraction and make this notebook clean, was developed some libraries to interact with [Github API](https://docs.github.com/pt/rest) and generate Pandas data frames, as well as save it in a __.csv__ file. More information about this libraries can be found [here](https://github.com/gsdenys/git-stat/blob/main/github/README.md).

So, let's import some custom libraries that will make the job for us.

In [None]:
from github.user import User
from github.repository import Repository
from github.commit import Commit
from github.languages import Languages 

### Initialization

Now, initialize the extractors. These provide a simple way to extract data using just a simple code line.

In [None]:
user = User()
repository = Repository()
commit = Commit()
languages = Languages()

## Data Loader

Once the extractors initialized, each data frame can be obtained using their own extractor.

### User

Load the __User__ data inside de extractor object, then save it at the configurated path, and store it at _user\_df_ data frame.

One data frame rows slice is shown to help users make sure that it had extracted correctly.

In [None]:
user_df = user.load().save().get()
user_df.head()

### Repositories

Load the __Repositories__ data inside de extractor object, then save it at the configurated path, and store it at _repos\_df_ data frame.

One data frame rows slice is shown to help users make sure that it had extracted correctly.

In [None]:
repos_df = repository.load(user.getData()).save().get()
repos_df.head()

### Commit

Load the __Commit__ data inside de extractor object, then save it at the configurated path, and store it at _commits\_df_ data frame.

One data frame rows slice is shown to help users make sure that it had extracted correctly.

In [None]:
commits_df = commit.load(repository.get()).save().get()
commits_df.head()

### Languages

Load the __Languages__ data inside de extractor object, then save it at the configurated path, and store it at _lang\_df_ data frame.

One data frame rows slice is shown to help users make sure that it had extracted correctly.

In [None]:
lang_df = languages.load(repository.get()).save().get()
lang_df.head()