# Github Data Collection

Using the [GitHub API](https://docs.github.com/pt/rest), extract the data from repositories, that are _user profile_,  _repositories_,  _commits_, and _languages_. In the end, it'll be saved into a  __.csv__ file.

Before running this notebook make sure that the file __config.ini__ file has the correct configuration. More information on how to configure and run this one can be found at [README.md](https://github.com/gsdenys/git-stat/blob/main/README.md) file.

## System Preparation 


### Requiremente Installation

Import required libraries, it depends on library installation. So, before running this notebook is recommended to run the command below inside this project home directory in a system terminal.

```bash
pip install -r requirements.txt
```

once all dependencies installed, the system is ready to run the [github's](http://github.com) data extraction.

### Import Library

To perform efficient data extraction and make this notebook clean, was developed some libraries to interact with [Github API](https://docs.github.com/pt/rest) and generate Pandas data frames, as well as save it in a __.csv__ file. More information about this libraries can be found [here](https://github.com/gsdenys/git-stat/blob/main/github/README.md).

So, let's import some custom libraries that will make the job for us.

In [1]:
from github.config import Config

from github.user import User
from github.repository import Repository
from github.commit import Commit
from github.languages import Languages 

### Initialization

Now, initialize the extractors. These provide a simple way to extract data using just a simple code line.

In [2]:
config = Config()

user = User(config)
repository = Repository(config)
commit = Commit(config)
languages = Languages(config)

## Data Loader

Once the extractors initialized, each data frame can be obtained using their own extractor.

### User

Load the __User__ data inside de extractor object, then save it at the configurated path, and store it at _user\_df_ data frame.

One data frame rows slice is shown to help users make sure that it had extracted correctly.

In [3]:
user_df = user.load().save().get()
user_df.head()

Dataframe saved at: /tmp/user_info_gsdenys.csv


Unnamed: 0,User,Name,Email,Location,Repositories,Gists,Bio
0,gsdenys,Denys G. santos,gsdenys@gmail.com,"São Paulo, SP, Brazil",36,1,"Father, husband, and senior IT professional."


### Repositories

Load the __Repositories__ data inside de extractor object, then save it at the configurated path, and store it at _repos\_df_ data frame.

One data frame rows slice is shown to help users make sure that it had extracted correctly.

In [4]:
repos_df = repository.load(user.getData()).save().get()
repos_df.head()

Dataframe saved at: /tmp/repos_info_gsdenys.csv


Unnamed: 0,Id,Name,Description,Created on,Updated on,Owner,License,Includes wiki,Forks count,Issues count,Stars count,Watchers count,Repo URL,Commits URL,Languages URL
0,50627775,alf-db-constraint,Database constraint to Alfresco Document Type,2016-01-29T01:27:23Z,2017-08-09T11:32:25Z,gsdenys,GNU Lesser General Public License v2.1,True,0,0,4,4,https://api.github.com/repos/gsdenys/alf-db-co...,https://api.github.com/repos/gsdenys/alf-db-co...,https://api.github.com/repos/gsdenys/alf-db-co...
1,39864719,alfresco-bulk-export,Automatically exported from code.google.com/p/...,2015-07-29T00:36:35Z,2019-09-19T11:17:25Z,gsdenys,GNU Lesser General Public License v3.0,False,33,17,11,11,https://api.github.com/repos/gsdenys/alfresco-...,https://api.github.com/repos/gsdenys/alfresco-...,https://api.github.com/repos/gsdenys/alfresco-...
2,197203698,amqp-client,Lua Client for AMQP,2019-07-16T13:52:44Z,2020-09-27T09:41:13Z,gsdenys,Apache License 2.0,True,7,6,5,5,https://api.github.com/repos/gsdenys/amqp-client,https://api.github.com/repos/gsdenys/amqp-clie...,https://api.github.com/repos/gsdenys/amqp-clie...
3,241175035,bpm-engine,Event drive engine for BPMN,2020-02-17T18:07:22Z,2020-02-17T18:23:29Z,gsdenys,Apache License 2.0,True,0,0,0,0,https://api.github.com/repos/gsdenys/bpm-engine,https://api.github.com/repos/gsdenys/bpm-engin...,https://api.github.com/repos/gsdenys/bpm-engin...
4,27539722,cmis-java-sample,Breve exemplo de ações usando OpenCMIS,2014-12-04T12:35:44Z,2015-03-23T14:09:08Z,gsdenys,,True,0,0,1,1,https://api.github.com/repos/gsdenys/cmis-java...,https://api.github.com/repos/gsdenys/cmis-java...,https://api.github.com/repos/gsdenys/cmis-java...


### Commit

Load the __Commit__ data inside de extractor object, then save it at the configurated path, and store it at _commits\_df_ data frame.

One data frame rows slice is shown to help users make sure that it had extracted correctly.

In [5]:
commits_df = commit.load(repository.get()).save().get()
commits_df.head()

Dataframe saved at: /tmp/commits_info_gsdenys.csv


Unnamed: 0,Repo Id,Commit Id,Date,Message,Repository
0,50627775,09efd76dac57118f5fc8fbdfdbd4e419415ddfed,2016-03-09T13:10:27Z,Update README.md,alf-db-constraint
1,50627775,72f303f68f7f880081ff06f66571ac51c2e2aa0a,2016-03-09T13:09:16Z,Update README.md,alf-db-constraint
2,50627775,dee2123d9074181662265aee0574a873db0a2d60,2016-03-08T23:58:41Z,create a basic program,alf-db-constraint
3,50627775,9c5dea4bad1f37a17ea70e172db77600c76b3c76,2016-03-08T23:38:11Z,initial commit,alf-db-constraint
4,50627775,1d41bf7b572d75cbd6102504b14a8bd88528d814,2016-01-29T01:27:23Z,Initial commit,alf-db-constraint


### Languages

Load the __Languages__ data inside de extractor object, then save it at the configurated path, and store it at _lang\_df_ data frame.

One data frame rows slice is shown to help users make sure that it had extracted correctly.

In [6]:
lang_df = languages.load(repository.get()).save().get()
lang_df.head()

Dataframe saved at: /tmp/language_info_gsdenys.csv


Unnamed: 0,Repo Id,Language,Size,Repo Name
0,50627775,Java,16765,alf-db-constraint
1,50627775,Batchfile,846,alf-db-constraint
2,50627775,Shell,457,alf-db-constraint
3,39864719,Java,62654,alfresco-bulk-export
4,197203698,Lua,81554,amqp-client
