## Microtask #4 : Software Heritage data
> To create a Python script to fetch data from [SoftwareHeritage](https://archive.softwareheritage.org/) using its [API](https://archive.softwareheritage.org/api/).
Given a target GitHub repository, the script should return a message if the repository is not available on SoftwareHeritage or the date of the last visit.
The script should rely on the endpoints: [origin](https://archive.softwareheritage.org/api/1/origin/) and [visits](https://archive.softwareheritage.org/api/1/origin/visits/).


#### What is [SoftwareHeritage](https://archive.softwareheritage.org/)?
- Software Heritage initiative is to collect all publicly available software in source code form together with its development history, replicate it massively to ensure its preservation, and share it with everyone who needs it.


**We'll start off by importing required modules**

In [1]:
import requests
from pprint import pprint

**Software heritage's Base of api endpoint**

In [2]:
# base requet url
base_url = "https://archive.softwareheritage.org/api"

**We'll require**
- `origin_endpoint` : In order to get `origin_id` of a given GitHub repository
- `visits_endpoint` : In order to get `visits` information of the corresponding origin_id ( of the given GitHub repository )

In [3]:
# setting urls for origin and visits endpoints
origin_endpoint = "/1/origin/git/url/{origin_url}"
visits_endpoint = "/1/origin/{origin_id}/visits/"

**Input: GitHub repository URL**

In [4]:
# getting repository url from user
REPOSITORY_URL = input()

https://github.com/sindresorhus/refined-github


**We'll now make a `GET` request to the origin_endpoint in order to get `origin_id` of the REPOSITORY_URL**

In [5]:
origin_endpoint_url = base_url + origin_endpoint.format(origin_url=REPOSITORY_URL)
request = requests.get(origin_endpoint_url)
print("Status code: ", request.status_code)

Status code:  200


**Defining function for separate actions**

In [6]:
def get_visits(visit_request_object):
    '''
    param visit_request_object: request object after making a GET request to visit_endpoint and status 200
    '''

    # getting the visit history 
    visits = visit_request_object.json()
    print("return JSON DATA FROM ORIGIN ENDPOINT")
    pprint(visits)
    print()

    # picking the lates one
    last_visit = visits[0]
    print("LAST VISITS DATA: ")
    pprint(last_visit)
    
    print("\nLAST VISITS DATE: ", last_visit["date"])


def found_in_software_heritage(request_object):
    '''
    param request_object: request object after making a GET request to origin_endpoint and status 200
    '''
    
    print("Found origin in Software Heritage archive.")
    print()
    
    # get the json return 
    json_data = request_object.json()
    print("return JSON DATA FROM ORIGIN ENDPOINT")
    pprint(json_data)
    
    # get corresponding origin_id of the repository 
    origin_id = json_data["id"]
    print("ORIGIN ID of {repo}: {originid}".format(repo=REPOSITORY_URL, originid=origin_id))
  
    
    # making a GET requets to the visits endpoint using the origin_id of the found archive
    visits_endpoint_url = base_url + visits_endpoint.format(origin_id=origin_id)
    visit_request = requests.get(visits_endpoint_url)

    if visit_request.status_code == 200:
        get_visits(visit_request)
    else:
        print("No visits found. ")

In [7]:
if request.status_code == 200:
    # Found the origin in archive
    found_in_software_heritage(request)
else:
    # Cannot find origin in archive
    print("Requested origin cannot be found in Software Heritage archive. ")

Found origin in Software Heritage archive.

return JSON DATA FROM ORIGIN ENDPOINT
{'id': 15333418,
 'origin_visits_url': '/api/1/origin/15333418/visits/',
 'type': 'git',
 'url': 'https://github.com/sindresorhus/refined-github'}
ORIGIN ID of https://github.com/sindresorhus/refined-github: 15333418
return JSON DATA FROM ORIGIN ENDPOINT
[{'date': '2018-12-24T08:41:47.595349+00:00',
  'origin': 15333418,
  'origin_visit_url': '/api/1/origin/15333418/visit/19/',
  'snapshot': None,
  'snapshot_url': None,
  'status': 'partial',
  'visit': 19},
 {'date': '2018-10-14T02:00:49.180673+00:00',
  'origin': 15333418,
  'origin_visit_url': '/api/1/origin/15333418/visit/18/',
  'snapshot': '597ae05ffb27b4e648cf795e88172d029f5d282b',
  'snapshot_url': '/api/1/snapshot/597ae05ffb27b4e648cf795e88172d029f5d282b/',
  'status': 'full',
  'visit': 18},
 {'date': '2018-10-11T13:26:17.418788+00:00',
  'origin': 15333418,
  'origin_visit_url': '/api/1/origin/15333418/visit/17/',
  'snapshot': '5f40075675890b

- This concludes Microtask #4 ;)