# Analysis Tech Trends with Github

* GitHub Repositories 2020 | [Kaggle](https://www.kaggle.com/vatsalparsaniya/github-repositories-analysis)
* GitHub Top stared Repositories of specific Domain (1200+)

## Content : There are more than 1000 repository information.

---
Data contains the main **19 columns**:
1. topic: A base word with the help of its fetched repository.
2. name: repository name.
3. user: repository user name.
4. star: stars are given by users.
5. fork: number of the fork that specific repository.
6. watch: repository watch
7. issue: number of issue in that repository.
8. pullrequests: number of pull requests 
9. projects: a number of projects undergoing that topictag.
10. topictag: tag added to the repository by the user.
11. discriptiontext: short discription added by user.
12. discription_url: additional url provide by repository.
13. commits: number of commits to that repository.
14. branches: a number of different branches of the repository.
15. packages: number of packages.
16. releases: releases of the repository.
17. contributors: a number of users have contributed to the repository.
18. License: name of License.
19. url: URL of the repository.

>current **repository topics**: Data-Science, Machine-Learning, Open-CV, Computer-Vision, GAN, variational-encoder, Android-studio, flutter, JAVA, awesome, javascript, c++

In [None]:
import pandas as pd

## 1. EDA

- Load the data and see the information such as size, columns and summary.
- Exctract separately Only the contents necessary for analysis and create data frame

In [None]:
df = pd.read_csv('../input/github-repositories-analysis/Github_data.csv')

In [None]:
df.head()

In [None]:
df.shape

In [None]:
df[['topic', 'name', 'star', 'fork', 'watch', 'issue','topic_tag', 'discription_text', 'commits']]

In [None]:
# I have compiled only what I need. Save the data.
github_df = df[['topic', 'name', 'star', 'fork', 'watch', 'issue','topic_tag', 'discription_text', 'commits']]

## 2. Data Preprocessing

- The star, fork, and watch columns have 'k' after the number. **Convert to numeric units and erase the '.'**

In [None]:
# make a function that replace 'k' to '000'
def counts(x):
    rx = x.replace('k','000')
    if '.' in rx:
        rx = rx.replace('.','')
        rx = rx[:-1]
        return int(rx)
    return int(rx)

In [None]:
# test function counts()
github_df['star'].apply(counts)
github_df['fork'].apply(counts)
github_df['watch'].apply(counts)

In [None]:
# apply function counts() to data frame
github_df['star'] = github_df['star'].apply(counts)
github_df['fork'] = github_df['fork'].apply(counts)
github_df['watch'] = github_df['watch'].apply(counts)

github_df.head()

In [None]:
# Check the statistics summary to obtain the average value before analysis.
github_df.describe()

In [None]:
# check 'topic' column
github_df['topic'].drop_duplicates()

In [None]:
# It seems that 100 rows are extracted for each topic. Check it out.
github_df['topic'][90:110]

In [None]:
github_df['topic'][190:210]

In [None]:
# Check whether 100 rows were extracted randomly or only the upper repositories were extracted with statistics.
github_df[github_df['topic']=='Open-CV']

> It appears to be sorted by 'star' in descending order. It is necessary to check whether the data was extracted in the order of the number of stars or if the data was sorted after random extraction.
>
> In this case, directly search the github site keyword and check it by sorting **Most starts**.

In [None]:
from IPython.display import Image
Image(filename='../input/opencv/Open-CV.png', height=280, width=800)

> Since it was data 5 months ago, it seems that there was a change in ranking, and it was confirmed that the star was extracted in descending order at the first extraction.
>
> Let's look at the data for each topic with ease.
> For reference, the median value of star in the statistical summary table we looked at was about 8059, so it can be said that a repository with at least 10,000 points or more is dealing with a topic that is attracting enough attention.

In [None]:
'''
Topic list =    
0              Data-Science
100        machine-Learning
200                 Open-CV
300         Computer-Vision
400                     GAN
500     variational-encoder
600          Android-studio
700                 flutter
800                    java
900                 awesome
1000             javascript
1100                    c++
1200           Raspberry pi
1300                Arduino
1400                 sensor
'''
# Most Popular Repositories about Data Science(10,000 or more stars -> 17)
github_df[github_df['topic']=='Data-Science'][:17]

In [None]:
'''
Topic list =    
0              Data-Science
100        machine-Learning
200                 Open-CV
300         Computer-Vision
400                     GAN
500     variational-encoder
600          Android-studio
700                 flutter
800                    java
900                 awesome
1000             javascript
1100                    c++
1200           Raspberry pi
1300                Arduino
1400                 sensor
'''
# Most Popular Repositories about machine-Learning (10,000 or more stars -> 52)
github_df[github_df['topic']=='machine-Learning'][:52]

In [None]:
'''
Topic list =    
0              Data-Science
100        machine-Learning
200                 Open-CV
300         Computer-Vision
400                     GAN
500     variational-encoder
600          Android-studio
700                 flutter
800                    java
900                 awesome
1000             javascript
1100                    c++
1200           Raspberry pi
1300                Arduino
1400                 sensor
'''
# Most Popular Repositories about Open-CV(10,000 or more stars -> 0)
github_df[github_df['topic']=='Open-CV'][:10]

In [None]:
'''
Topic list =    
0              Data-Science
100        machine-Learning
200                 Open-CV
300         Computer-Vision
400                     GAN
500     variational-encoder
600          Android-studio
700                 flutter
800                    java
900                 awesome
1000             javascript
1100                    c++
1200           Raspberry pi
1300                Arduino
1400                 sensor
'''
# Most Popular Repositories about Computer-Vision(10,000 or more stars -> 8)
github_df[github_df['topic']=='Computer-Vision'][:10]

In [None]:
'''
Topic list =    
0              Data-Science
100        machine-Learning
200                 Open-CV
300         Computer-Vision
400                     GAN
500     variational-encoder
600          Android-studio
700                 flutter
800                    java
900                 awesome
1000             javascript
1100                    c++
1200           Raspberry pi
1300                Arduino
1400                 sensor
'''
# Most Popular Repositories about GAN(10,000 or more stars -> 1)
github_df[github_df['topic']=='GAN'][:10]

In [None]:
'''
Topic list =    
0              Data-Science
100        machine-Learning
200                 Open-CV
300         Computer-Vision
400                     GAN
500     variational-encoder
600          Android-studio
700                 flutter
800                    java
900                 awesome
1000             javascript
1100                    c++
1200           Raspberry pi
1300                Arduino
1400                 sensor
'''
# Most Popular Repositories about variational-encoder(10,000 or more stars -> 1)
github_df[github_df['topic']=='variational-encoder'][:10]

In [None]:
'''
Topic list =    
0              Data-Science
100        machine-Learning
200                 Open-CV
300         Computer-Vision
400                     GAN
500     variational-encoder
600          Android-studio
700                 flutter
800                    java
900                 awesome
1000             javascript
1100                    c++
1200           Raspberry pi
1300                Arduino
1400                 sensor
'''
# Most Popular Repositories about Android-studio(10,000 or more stars -> 0)
github_df[github_df['topic']=='Android-studio'][:10]

In [None]:
'''
Topic list =    
0              Data-Science
100        machine-Learning
200                 Open-CV
300         Computer-Vision
400                     GAN
500     variational-encoder
600          Android-studio
700                 flutter
800                    java
900                 awesome
1000             javascript
1100                    c++
1200           Raspberry pi
1300                Arduino
1400                 sensor
'''
# Most Popular Repositories about flutter(10,000 or more stars -> 4)
github_df[github_df['topic']=='flutter'][:10]

In [None]:
'''
Topic list =    
0              Data-Science
100        machine-Learning
200                 Open-CV
300         Computer-Vision
400                     GAN
500     variational-encoder
600          Android-studio
700                 flutter
800                    java
900                 awesome
1000             javascript
1100                    c++
1200           Raspberry pi
1300                Arduino
1400                 sensor
'''
# Most Popular Repositories about java(10,000 or more stars -> 82)
github_df[github_df['topic']=='java'][:85]

In [None]:
'''
Topic list =    
0              Data-Science
100        machine-Learning
200                 Open-CV
300         Computer-Vision
400                     GAN
500     variational-encoder
600          Android-studio
700                 flutter
800                    java
900                 awesome
1000             javascript
1100                    c++
1200           Raspberry pi
1300                Arduino
1400                 sensor
'''
# Most Popular Repositories about awesome(10,000 or more stars -> 0)
github_df[github_df['topic']=='awesome'][:10]

In [None]:
'''
Topic list =    
0              Data-Science
100        machine-Learning
200                 Open-CV
300         Computer-Vision
400                     GAN
500     variational-encoder
600          Android-studio
700                 flutter
800                    java
900                 awesome
1000             javascript
1100                    c++
1200           Raspberry pi
1300                Arduino
1400                 sensor
'''
# Most Popular Repositories about javascript(10,000 or more stars -> 100+)
github_df[github_df['topic']=='javascript'][:100]

In [None]:
'''
Topic list =    
0              Data-Science
100        machine-Learning
200                 Open-CV
300         Computer-Vision
400                     GAN
500     variational-encoder
600          Android-studio
700                 flutter
800                    java
900                 awesome
1000             javascript
1100                    c++
1200           Raspberry pi
1300                Arduino
1400                 sensor
'''
# Most Popular Repositories about c++(10,000 or more stars -> 100+)
github_df[github_df['topic']=='c++'][:100]

In [None]:
'''
Topic list =    
0              Data-Science
100        machine-Learning
200                 Open-CV
300         Computer-Vision
400                     GAN
500     variational-encoder
600          Android-studio
700                 flutter
800                    java
900                 awesome
1000             javascript
1100                    c++
1200           Raspberry pi
1300                Arduino
1400                 sensor
'''
# Most Popular Repositories about Raspberry pi(10,000 or more stars -> 100+)
github_df[github_df['topic']=='Raspberry pi'][:10]

In [None]:
'''
Topic list =    
0              Data-Science
100        machine-Learning
200                 Open-CV
300         Computer-Vision
400                     GAN
500     variational-encoder
600          Android-studio
700                 flutter
800                    java
900                 awesome
1000             javascript
1100                    c++
1200           Raspberry pi
1300                Arduino
1400                 sensor
'''
# Most Popular Repositories about Arduino(10,000 or more stars -> 4)
github_df[github_df['topic']=='Arduino'][:10]

In [None]:
'''
Topic list =    
0              Data-Science
100        machine-Learning
200                 Open-CV
300         Computer-Vision
400                     GAN
500     variational-encoder
600          Android-studio
700                 flutter
800                    java
900                 awesome
1000             javascript
1100                    c++
1200           Raspberry pi
1300                Arduino
1400                 sensor
'''
# Most Popular Repositories about sensor(10,000 or more stars -> 0)
github_df[github_df['topic']=='sensor'][:10]