# Welcome to the challenge!

The challenge is in two parts, please complete both to the best of your ability.

## Submission
To start working, duplicate this notebook to your drive or download it and work on it in locally.
To submit your participation, upload the final `.ipynb` file to the submission form. 

Good luck!


# <u>Part one</u>

This challenge was designed to test your creativity in an unconventional scenario. There are two lists with varying levels of difficulty, `hashes_easy` and `hashes_medium`.
Your job is to find out, or approximate as best as you can the hidden hash function. These functions are purely made of a combination of binary operations.

Example of a hash function:
```python
def hash_function_test(x):
  return x & 2

# you only get the `hashes_test`
hashes_test = [hash_function_test(x) for x in range(2048)]
```

## Solution's format
You can use any amount of precomputation you want.
However, the solution must be thought of as entirely standalone. Any resources you use, whether that be helper functions, vectors of coefficients, or anything else must be included in the function's scope

e.g.
```python
def solution(x):
  # some_utility must be defined inside the solution function
  def some_utility(y):
    return y*2
  
  # same for constants
  coefficients = [1,2,3,4]
  
  return some_utility(x) * coefficients[x%4]
```

You can only assume that `numpy` is imported (as `np`), however you can install arbitrary packages using 
`!pip install package` and use them for your precomputation.

## Scoring
Your score is based on the __length__ (in characters!) of the solution you provide, together with the proximity to the ground truth. Special characters are not counted (newlines, spaces, tabs, general symbols), and the first 100 characters are also not counted. To see the exact definition of the score, check the code of the `evaluate` function. **Please document your approach** as it will strongly be considered in our evaluation.


In [1]:
!pip install numpy



In [2]:
import numpy as np
import inspect
import re
import json
from typing import List, Callable, Union

In [3]:
ignored_characters = re.compile("[^A-Za-z0-9\,\;]")

def compute_prediction_score(truth: List[int], solution: Callable[[int], int]) -> float:
    prediction_score = 0
    for i in range(len(truth)):
        distance = np.abs(truth[i] - solution(i))
        prediction_score += (10-distance) if distance < 10 else 0
    prediction_score /= len(truth)
    return prediction_score

def evaluate(truth: List[int], solution: Callable[[int], int]) -> float:
    """ 
    Returns the loss of a solution 
    :param truth: array of ground truth hashes
    :param solution:  solution function, which takes an index and returns the 
                      predicted hash

    :return: the score as a float in the range [0,10]
    """
    prediction_score = compute_prediction_score(truth, solution)
    
    print("Average prediction score: ", prediction_score)
    
    # remove `def function_name(x):` from the source
    source = inspect.getsource(solution)
    source = source[source.index(":")+1:]

    length_score = len(source) - len(ignored_characters.findall(source))
    length_score -= 100
    length_score /= 100
    length_score = length_score if length_score > 1.0 else 1.0
    
    print("Length score: ", length_score)
    
    score = prediction_score / length_score
    print("Final score: ", score)
    return score

In [4]:
!curl -O https://x80-public.s3.eu-west-3.amazonaws.com/hashes.json

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  4496  100  4496    0     0  12076      0 --:--:-- --:--:-- --:--:-- 12118


In [5]:
with open("hashes.json", "r") as f:
    hashes = json.loads(f.read())
hashes_easy, hashes_medium, hashes_hard = hashes["hashes_easy"], hashes["hashes_medium"], hashes["hashes_hard"]

## Example solution

In [6]:
def example_solution(x):
    return x * x + 2

evaluate(hashes_easy, example_solution)

Average prediction score:  0.28125
Length score:  1.0
Final score:  0.28125


0.28125

# Level 1
### Explanation of the approach
The loop_values is repeated over and over therefore, to guess the index x I can simply calculate the remainder of the division x/5. Then from the index I calcuate the value in the loop_values array. This is a direct approach which means I dont need to calculate the value of the index (x-1) to get the value of x.

In [7]:
def mysolution_easy(x: int) -> int:
    loop_values = [1, 4, 7, 10, 3]
    loop_leng = len(loop_values)
    indx = x % loop_leng
    return loop_values[indx]

evaluate(hashes_easy, mysolution_easy)

Average prediction score:  10.0
Length score:  1.0
Final score:  10.0


10.0

# Level 2
### Explanation of the approach
The pattern consist on a loop of 1 series of 7 and 3 series of 6 values. Between each series the value dicrease of 18 than for each value in the series the values increase of 7.

To emulate this trend a dynamic approach was used. The trick consist of counting how many series of 7 (s7), of 6 (s6) and in what position (pos) in the series is placed the given index x. This is easily done in a foor loop moving between the series until reaching the index x. At the end of the loop the values of s7, s6, pos are given therefore, to guess the value of the index x it is possible to apply the easy formula:

s7 * 42 + s6 * 35 + pos * 7 - sub

The formula simply counts the series and positions and multiply for the respected sum values (42 for a series of 7, 35 for a series of 6, 7 for each position) before subtracting 18 * s (where s are the total series : s = s6 + s7)

In [8]:
def mysolution_medium(x: int) -> int:
    s = 0  
    s7 = 0  
    s6 = 0  
    pos = 0  
    
    for indx in range(x):
        pos += 1

        if (pos == 6 and s== 0):  # skip first serie to start indexing from 0
            pass
        
        elif (pos == 6 and s%4 != 0):  # serie of 6
            s += 1  
            s6 += 1  
            pos = 0 
            
        elif pos == 7:  # serie of 7
            s += 1  
            s7 += 1 
            pos = 0 

    #calculate
    sub = 18*s 
    return(s7*42 + s6*35 + pos*7 - sub)

evaluate(hashes_medium, mysolution_medium)

Average prediction score:  10.0
Length score:  1.0
Final score:  10.0


10.0

# [Optional] Over 9000
### No character limit - not for the faint of heart
The pattern consist on adding and subtracting odd numbers starting from 1 (odd) in triplets. First, the next odd number is add to a serie starting from 127 (base). Then the odd number is substracted before adding again the next odd. The following rules are then being applied:

- After this triplet an incremental value of 20 (if even serie) or 36 (odd serie) is added. The incremental value start from 35 (incr).

- The next triplet follow the same pattern but the odd number subtracted and added change to the next odd number.

- At index 128 the values is decreased once of exaclty 128.


Again, to emulate this trend a dynamic approach was used in a for loop replicating the steps above.  

In [9]:
def mysolution_hard(x: int) -> int:
    odd = 1  # starting odd value
    base = 127 # starting value 
    incr = 0  # increment value
    add = [35, 36, 20, 128]
    pos = 0  # position in the triplet
    serie = 0 
    
    for indx in range(x):
        
        if pos == 3:  # at the end of each triplet
            
            if serie == 0:  # if it is the first triplet
                incr += add[0] 
            elif serie%2 == 0:  # if serie is even 
                incr += add[1] 
            else:  # if serie is odd
                incr += add[2]
            
            if indx == 127:  
                incr += add[3]
                base += incr 
                incr -= add[3]
            else:
                base += incr
                
            # reset 
            pos = 0
            serie += 1 
            odd += 4
            
        else:  # if we are still inside the triplet
            
            if pos%2 != 0:  # is position is odd
                base -= odd 
            else:
                base += odd + 2  # if position is even
            pos += 1

    return base

compute_prediction_score(hashes_hard, mysolution_hard)

10.0

# <u>Part Two</u>

For this exercise, you will do a mini-integration of the Github API.
Write production grade code which, provided a public github repository, does the following:

1. Get the following information
- the top 6 first collaborators
- all the repositories of the collaborators
- all the groups of the collaborators

2. Be able to query the data:
- query any user information
- query any repository information
- group the users by organisations

The questions are purposely left open-ended to allow you to create the structure as you see fit. 
You should assume this code is not "use once only", but would be augmented adding more features.


*Note: keep in mind there is an API rate limit of 60 per hour, be mindful with your calls and use the documentation.*

In [12]:
import pandas as pd

""" Requests module for API calls """
import requests

import unittest

from typing import List, Dict, Optional, Callable, Union

PandasDf = 'pandas.core.frame.DataFrame'   # pandas DataFrame type


class GitHubRepo:
    """ Shows and query data using GitHub API

    Attributes:
        user (string): username of the selected GitHub user
        repo (string): user repository name 
        status (bool): status of the HTTP request to GitHub API.
                       True means connection is enstablished.
        code (int): code response of the HTTP request
        remaining_requests (int): remaining GitHub API calls. 
                                  Limit is 60 per hour
        GIT_URL (string): main GitHub API url
        REPO_URL (string): GitHub API url of the instance repository
        contributors (List[Dict]): list containing all the contributors
                                   info of the repository
        top_contributors (List[Dict]): list containing top contributors 
                                       info based on number of 
                                       contributions to the repository
    """
    
    def __init__(self, user: str, repo: str) \
                 -> Union[List[dict], int, bool]:
        """ Return the json file of the repository

        An HTTP request is forwarded to the GitHub API which returns
        a list containing the json file of the selected 
        repository (repo) of the GitHub user (user). If the returned
        requests contains an error code it is returned instead of the
        list. Whether no connection with the GitHub API can be 
        enstablished or if a non string type is passed to the __init__ 
        function it returns False.

        Args:
            user (string): GitHub username
            repo (string): GitHub repository name of the user
        Returns:
            List[dict]: which includes the main information of the 
                         requested repo if repo exist and there is a 
                         API response
            int: False, if there is no API response
            bool: if there API response includes an error code 
        """
        self.user = user
        self.repo = repo
        self.status = None 
        self.code = None        
        self.remaining_requests = None      
        self.GIT_URL = "https://api.github.com"
        self.REPO_URL = self.GIT_URL + "/repos/" + user + "/" + repo
        self.request(self.REPO_URL)

    def show_top_contribs(self, max_contributors: Optional[int]=6) \
                          -> Union[PandasDf, None]:
        """ Display top contributors of the instance repository 

        Args:
            max_contributors (Optional[int]): number of contributors 
                                              to show. Default is 6.
        Returns:
            PandasDf: pandas DataFrame containing top_contributors 
                       name and infos
            None: if there is no API response 
        """
        self.get_top_contribs(max_contributors)     
        top_df = pd.DataFrame(self.top_contributors)
        return top_df

    def show_top_contribs_repos(self, 
                                max_contributors: Optional[int]=6) \
                                -> Union[PandasDf, None]:
        """ Display all the GitHub repository of each contributor of 
            the current instance repository

        Args:
            max_contributors (Optional[int]): number of contributors. 
            Default is 6.

        Returns:
            PandasDf: pandas DataFrame of the repositories 
                       info of the top_contributors
            None: if there is no API response 
        """
        frames = []     # create a list to append pandas DataFrames
        self.get_top_contribs(max_contributors)
        for contrib in range(len(self.top_contributors)):       
            CONTRIB_REPOS_URL = (
                self.GIT_URL
                + "/users/"
                + self.top_contributors[contrib]["login"]
                + "/repos"
                )
            req = self.request(CONTRIB_REPOS_URL)
            if self.return_error_code(req):     
                return
            repos = req.json()
            df = pd.DataFrame(repos)
            frames.append(df)
            caption = "Repository of " \
                      + str(self.top_contributors[contrib]["login"])
            df = df.style.set_caption(caption)
            display(df)
            
        return pd.concat(frames)       

    def show_top_contribs_orgs(self, max_contributors: Optional[int]=6) \
                           -> Union[PandasDf, None]:
        """ Display the top contributors divided per organization

        Args:
            max_contributors (Optional[int]): number of contributors. 
            Default is 6.

        Returns:
            PandasDf: pandas DataFrame of the contributors associated
                      with each organization
            None: if there is no API response 
        """
        self.get_top_contribs(max_contributors)
        top_orgs_df = self.contribs_orgs(self.top_contributors, 
                                         print_orgs=True) 
        return top_orgs_df

    def get_contrib_info(self, contrib: str) \
                         -> Union[PandasDf, None]:
        """ Shows the infos of a selected contributors
        
        Args:
            contrib (str): contributor login

        Returns:
            PandasDf: pandas DataFrame showing all the 
                      contrib information
            None: if there is no API response 
        """
        MAIN_CONTRIB_URL = self.GIT_URL + "/users/" + contrib
        req = self.request(MAIN_CONTRIB_URL)
        if self.return_error_code(req):
            return
        contrib_infos = req.json()
        df = pd.DataFrame()
        df = df.append(contrib_infos, ignore_index=True)
        return df

    def get_contrib_repos_info(self, contrib: str) \
                               -> Union[PandasDf, None]:
        """ Shows all the repository of a contributor

        Args:
            contrib (string): contributor login

        Returns:
            PandasDf: pandas DataFrame showing all the 
                      contrib information
            None: if there is no API response 
        """
        CONTRIB_REPOS_URL = (self.GIT_URL 
                            + "/users/" 
                            + contrib  
                            + "/repos"
                            )
        req = self.request(CONTRIB_REPOS_URL)
        if self.return_error_code(req):
            return
        contrib_repo_infos = req.json()
        df = pd.DataFrame()
        df = df.append(contrib_repo_infos, ignore_index=True)
        caption = (contrib +  " Repositories")
        df = df.style.set_caption(caption)
        return df

    def get_contribs_by_orgs(self) -> Union[PandasDf, None]:
        """ Shows all the contributors divided per organization

        Returns:
            PandasDf: pandas DataFrame of organizations 
            and contributors
            None: if there is no API response 
        """
        self.get_all_contribs()
        orgs_df = self.contribs_orgs(self.contributors, 
                                     print_orgs=False)
        return orgs_df

    def get_all_contribs(self) -> Union[List[dict], None]:
        """ Gets all the repository contributors

        Returns:
            List[dict]: returns the attribute self.contributors
            None: if there is no API response 
        """
        try:
            self.contributors
        except:
            CONTRIBUTORS_URL = self.REPO_URL + "/contributors"
            req = self.request(CONTRIBUTORS_URL)
            if self.return_error_code(req):
                return
            self.contributors = req.json()
            return self.contributors

    def get_top_contribs(self, max_contributors: int) \
                         -> Union[List[dict], None]:
        """ Gets the repository top contributors based on
            number of contribution

        Returns:
            List[dict]: returns the attribute self.top_contributors
            None: if there is no API response 
        """
        try:
            self.contributors # try contributors, otherwise recalculate
        except:
            contributors = self.get_all_contribs()  

        if contributors is None:    # if request still None return 
            return
        
        # if top_contributors exists check list length and 
        # do not recalculate if it's greater than max_contributors
        try:
            self.top_contributors   
            if len(self.top_contributors) > max_contributors:
                self.top_contributors = ( 
                    self.top_contributors[:max_contributors]
                    )   
                return self.top_contributors    
            else:
                raise Exception
        except: # set lenght of top_contributors
            if len(self.contributors) >= max_contributors:
                self.max_contributors = max_contributors
            else:
                self.max_contributors = len(self.contributors)
        
        # append from self.contributors to self.top_contributors
        self.top_contributors = []
        for contrib in range(self.max_contributors):    
            self.top_contributors.append(self.contributors[contrib])
        return self.top_contributors

    def contribs_orgs(self, contrib_list: List[dict], 
                      print_orgs: Optional[bool]=True) \
                      -> Union[PandasDf, None]:
        """ Shows organization name of a repo contributors list

        Args:
            contrib_list (List[dict]): list of contributors, can be both
                                       self.contributors or 
                                       self.top_contributors
            print_orgs (bool): if True shows for each contributors a 
                               Pandas DataFrame of the organization info.
                               if False shows a summary of organizations
                               and associated contributors 
                               in a Pandas DataFrame
        Returns:
            PandasDf: pandas Dataframe of the contribuors organizations
            None: if there is no API response 
        """
        frames = []     # a list to append DataFrames
        group = {}      # a dictionary of contrib/orgs pair

        # getting the organizations for each contrib
        for contrib in range(len(contrib_list)):
            CONTRIB_ORGS_URL = (
                self.GIT_URL + "/users/" +
                contrib_list[contrib]["login"] + "/orgs"
            )
            req = self.request(CONTRIB_ORGS_URL)
            if self.return_error_code(req):
                return
            orgs = req.json()   

            # if the contrib has organizazion add to group
            if len(orgs) > 0:   
                for org in range(len(orgs)):
                    if orgs[org]["login"] not in group:
                        group[orgs[org]["login"]] = []
                    group[orgs[org]["login"]].append(
                        contrib_list[contrib]["login"])
                # shows organization info for each contrib
                if print_orgs:
                    df = pd.DataFrame(orgs)
                    frames.append(df)
                    caption = ( str(contrib_list[contrib]["login"]) 
                        + " organizations"
                        )
                    df = df.style.set_caption(caption)
                    display(df)
            else:
                print(contrib_list[contrib]["login"], 
                    " is not part of any organization"
                    )
        # shows a summary of the contrib/orgs pair
        if print_orgs == False:
            df = pd.DataFrame()
            df = df.append(group, ignore_index=True)
            return df
        else:
            return pd.concat(frames)

    def return_error_code(self, req: Union[List[dict], int, bool]) \
                          -> bool:
        """ If HTTP request return an error code return True

        Args:
            req (Union[List[dict], int, bool]): 
                        return of the GitHub API HTTP request.
                        If it's a List: correct response
                        If it's a int: error code in the response
                        If it's a bool: no connection with API 

        Returns:
            bool: True if there is an error code
        """
        if isinstance(req, int):
            print("Error", req)
            return True

    def request(self, url: str) -> Union[List[dict], int, bool]:
        """ Attempt an HTTP request to the given GitHub API url
        
        Args:
            url(str): GitHub API url
        
        Returns:
            req (Union[List[dict], int, bool]): 
                        return of the GitHub API HTTP request.
                        If it's a List: correct response
                        If it's a int: error code in the response
                        If it's a bool: no connection with API    
        """
        # if connection is instablished, status=True
        try:
            req = requests.get(url)
            self.status = True
            self.code = req.status_code
            self.remaining_requests = ( 
                int(req.headers["X-RateLimit-REmaining"])
                )
        except Exception:  # if no connection return status False
            self.status = False
            return self.status

        # return error code if no correct response (code=200)
        if self.code != 200:    
            return self.code    
        
        return req  # return correct response


class ApiTest(unittest.TestCase):
    """ Used to unitest the GitHub class """

    def test_init(self):
        # max API connection test 
        # (run only if 60 coonetion per hour are reached) 
        max_connection_test = GitHubRepo("microsoft", "terminal")
        if max_connection_test.code == 403:
            self.assertEqual(max_connection_test.remaining_requests, 0)

        # name error test
        with self.assertRaises(Exception) as cm:
            name_error_test = GitHubRepo(user, repo)
        the_exception = cm.exception
        self.assertEqual(type(the_exception).__name__, "NameError")

        # no connection test
        with self.assertRaises(Exception) as cm:
            no_connection_test = GitHubRepo("microsoft", "terminal")
            raise requests.exceptions.ConnectionError
        the_exception = cm.exception
        self.assertEqual(type(the_exception).__name__, "ConnectionError")

        # ok repo test (run only if:
        # connection works (status=True) 
        # and no max connection are reached (code=403))
        real_repo_test = GitHubRepo("microsoft", "terminal")
        if real_repo_test.status and real_repo_test.code != 403:
            self.assertEqual(real_repo_test.code, 200)

        # test fake repo (run only if:
        # connection works (status=True) 
        # and no max connection are reached (code=403))
        fake_repo_test = GitHubRepo("fakeUsername", "fakeRepo")
        if fake_repo_test.status and fake_repo_test.code != 403:
            self.assertEqual(fake_repo_test.code, 404)

    def test_get_all_contribs(self): 
        # run only if correct response (code=200)    

        # request 30 contribs and check response is 30
        all_contribs_test = GitHubRepo("microsoft", "terminal")
        if all_contribs_test.code != 200:
            return
        contributors = all_contribs_test.get_all_contribs()
        self.assertTrue(len(contributors), 30)

    def test_get_top_contribs(self):
        # run only if correct response (code=200)

        # request 6 top contribs and check reponse is 6
        top_six_contribs_test = GitHubRepo("microsoft", "terminal")
        if top_six_contribs_test.code != 200:
            return
        top_contributors = top_six_contribs_test.get_top_contribs(6)
        self.assertEqual(len(top_contributors), 6)

        # request 31 top contribs (all are 30) and check reponse is 30
        top_31_contribs_test = GitHubRepo("microsoft", "terminal")
        if top_31_contribs_test.code != 200:
            return
        top_contributors = top_31_contribs_test.get_top_contribs(31)
        self.assertEqual(len(top_contributors), 30)

    def test_show_top_contribs(self):
        # run only if correct response (code=200)
        show_top_contribs_test = GitHubRepo("microsoft", "terminal")
        if show_top_contribs_test.code != 200:
            return
        top_df = show_top_contribs_test.show_top_contribs(6)
        index = top_df.index
        number_of_rows = len(index)
        self.assertEqual(number_of_rows, 6)

    def test_show_top_contribs_repos(self):
        # run only if correct response (code=200)
        show_top_contribs_repos_test = GitHubRepo("microsoft", "terminal")
        if show_top_contribs_repos_test.code != 200:
            return
        allrepos = show_top_contribs_repos_test.show_top_contribs_repos(2)
        index = allrepos.index
        number_of_rows = len(index)
        self.assertEqual(number_of_rows, 38)

    def test_show_top_contribs_orgs(self):
        # run only if correct response (code=200)
        show_contribs_orgs_test = GitHubRepo("microsoft", "terminal")
        if show_contribs_orgs_test.code != 200:
            return
        orgs = show_contribs_orgs_test.show_top_contribs_orgs(4)
        index = orgs.index
        number_of_rows = len(index)
        self.assertEqual(number_of_rows, 4)

    def test_get_contrib_info(self):
        # run only if correct response (code=200)
        contrib = GitHubRepo("microsoft", "terminal")
        if contrib.code != 200:
            return
        contrib_df = contrib.get_contrib_info("DHowett")
        contrib_login = contrib_df.at[0, "login"]
        self.assertEqual(contrib_login, "DHowett")

    def test_get_contrib_repos_info(self):
        # run only if correct response (code=200)
        repo = GitHubRepo("microsoft", "terminal")
        if repo.code != 200:
            return
        contrib_repos_df = repo.get_contrib_repos_info("DHowett")
        index = contrib_repos_df.index
        number_of_rows = len(index)
        self.assertEqual(number_of_rows, 30)

    def test_get_contribs_by_orgs(self):
        # run only if correct response (code=200)
        repo = GitHubRepo("microsoft", "terminal")
        if repo.code != 200:
            return
        orgs_df = repo.get_contribs_by_orgs()
        number_of_columns = len(orgs_df.columns)
        self.assertEqual(number_of_columns, 16)

Myrepo = GitHubRepo("OISF", "suricata")
Myrepo.get_contrib_info("jlucovsky")
# if __name__ == '__main__':
#     unittest.main(argv=[''], exit=False)

Unnamed: 0,login,id,node_id,avatar_url,gravatar_id,url,html_url,followers_url,following_url,gists_url,...,email,hireable,bio,twitter_username,public_repos,public_gists,followers,following,created_at,updated_at
0,jlucovsky,45717383.0,MDQ6VXNlcjQ1NzE3Mzgz,https://avatars.githubusercontent.com/u/457173...,,https://api.github.com/users/jlucovsky,https://github.com/jlucovsky,https://api.github.com/users/jlucovsky/followers,https://api.github.com/users/jlucovsky/followi...,https://api.github.com/users/jlucovsky/gists{/...,...,,1.0,,jlucovsky,4.0,0.0,4.0,0.0,2018-12-08T16:38:12Z,2022-01-19T15:47:27Z


# Want to share some feedback? Please do so here!

# I really liked the hash challenge :)