The first step is to find and retrieve Jenkinsfile(s) from various open source repositories. We decided to use GitHub's REST API for our application. We are trying to access a jenkinsfile's raw content so that we can append it to one text file that contains all the jenkinsfiles' contents. This process will streamline parsing and generating results of analyses later on.

We are using a Jupyter Notebook from Anaconda with Python 3.6, we use the following packages to start off with our task.

In [1]:
import requests # Helps working with HTTP requests such as GitHub API.
import os # For performing tasks on shell.
import subprocess # For performing tasks on shell.
import logging # For creating log files

In [2]:
'''Configure the log file'''
LOG_FILENAME = 'course_proj.log'
logging.basicConfig(
    filename=LOG_FILENAME,
    level=logging.DEBUG,
    format='%(asctime)s %(levelname)s %(message)s',
    filemode='w'
)

In [3]:
'''GitHub REST API at work with the Python REQUESTS package.

We ran a query that gives us all the files whose names match the string "jenkinsfiles", following proper convention.'''

repositories = requests.get('https://api.github.com/search/code?q=pipeline+NOT+node+in:file+filename:jenkinsfile+?page=1&per_page=100', auth = ('kshirsagarpratik', 'Chinku95'))

In [4]:
''' Iteratively retrieving the raw text content of all the jenkinsfiles returned by the API call and appending to a single textfile called "Jenkinsfile."'''

print('Attempting to retrieve jenkinsfile(s) from various GitHub repositories through it\'s REST API...')
logging.info('Attempting to retrieve jenkinsfile(s) from various GitHub repositories through it\'s REST API...')

number = 1 # iteratively store jenkinsfiles locally.
try:
    for repo in repositories.json()['items']:
        print (repo['url'])
        logging.info (repo['url'])
        jenkinsfile = requests.get(repo['url'], auth = ('kshirsagarpratik', 'Chinku95')) # Authentication for GitHub
        print(jenkinsfile)
        logging.info(jenkinsfile)
        jenkinsfile_content = requests.get(jenkinsfile.json()['download_url'], auth = ('kshirsagarpratik', 'Chinku95'))
        file_pointer = open('Jenkinsfile' + str(number) + '.txt', 'a+')
        file_pointer.write(jenkinsfile_content.text)
        file_pointer.close()
        number = number + 1
except Exception as e:
    print('Error occured in retrieving jenkinsfile(s)')
    logging.error('Error occured in retrieving jenkinsfile(s)')
    # LOG THIS INTO SOME FILE

Attempting to retrieve jenkinsfile(s) from various GitHub repositories through it's REST API...
https://api.github.com/repositories/121249971/contents/Jenkinsfile?ref=6435ca20fcbf0f2d33c54f005879c556add89093
<Response [200]>
https://api.github.com/repositories/104853848/contents/Jenkinsfile?ref=59e0f2e65d7e7fb22190e0efc638acddcdb9df6f
<Response [200]>
https://api.github.com/repositories/77950722/contents/Jenkinsfile?ref=39741f7abbc70533cc8ffdc05c77473b2888e0f4
<Response [200]>
https://api.github.com/repositories/74116737/contents/Jenkinsfile?ref=f1335c7bb428b58ed0d464d0591ce72771bb95c8
<Response [200]>
https://api.github.com/repositories/100200400/contents/Jenkinsfile?ref=c1ca02f2b5f4d447b64c75b1708ff0225acf6226
<Response [200]>
https://api.github.com/repositories/77928436/contents/Jenkinsfile?ref=c42e99b7d2de4ef91e273b3ddb9b670f9b6148df
<Response [200]>
https://api.github.com/repositories/116803249/contents/Jenkinsfile?ref=9230501a85a80fd2869819be3126791b49e2270e
<Response [200]>
http