### GitHub Dataset

This dataset contains an array of events from a Git repository server. Among other, each event contains the following attributes:

- `id`: unique integer identifier of the event
- `event`: string name of the event type (e.g. "PushEvent", "PullRequestEvent", "IssuesEvent")
- `actor`: key-value map identifying an actor issuing the event
- `repo`: key-value map identifying the repository related to the event
- `payload`: key-value map with different fields for each event type
- `public` : 
- `created_at`: string with the data of the event in the format YYYY-MM-DDTHH:MM:SSZ, where T and Z are separators (e.g. 2018-01-01T15:00:00Z)

PushEvents additionally have information about the commit. <br>
An example event can be seen below: <br>
{<br>
&emsp;"id":"7045118886", <br>
&emsp;"event":"PushEvent", <br>
&emsp;"actor":{ <br>
&emsp;&emsp; "id":20090775,<br>
&emsp;&emsp; "login":"lainrose",<br>
&emsp;&emsp; ...<br>
&emsp;},<br>
&emsp;"repo":{<br>
&emsp;&emsp; "id":115387592,<br>
&emsp;&emsp; "name":"lainrose/Python-Grammar",<br>
&emsp;&emsp; "url":"https://api.github.com/repos/lainrose/Python-Grammar"<br>
&emsp;},<br>
&emsp;"payload":{<br>
&emsp;&emsp; "push_id":2226161589,<br>
&emsp;&emsp; "commits":[<br>
&emsp;&emsp;&emsp;&nbsp;&nbsp;{<br>
&emsp;&emsp;&emsp;&emsp;"sha":"27a01fbdbec8e26daa40fc8faa052dd0be23836a",<br>
&emsp;&emsp;&emsp;&emsp;"author":{<br>
&emsp;&emsp;&emsp;&emsp;&emsp;"name":"lainrose",<br>
&emsp;&emsp;&emsp;&emsp;&emsp;"email":"fb4676bf30682e2ece361fd363a69ad11779c42e@Naver.com"<br>
&emsp;&emsp;&emsp;&emsp;},<br>
&emsp;&emsp;&emsp;&emsp;"message":"Update Study Contents",<br>
&emsp;&emsp;&emsp;&emsp;...<br>
&emsp;&emsp;&emsp;&nbsp;&nbsp;}<br>
&emsp;&emsp; ]<br>
&emsp;},<br>
&emsp;"public":true,<br>
&emsp;"created_at":"2018-01-01T15:00:00Z"<br>
}                                       <br>

#### 1. How many repos contain both a DeleEvent and a PushEvent?

In [None]:
%%jsoniq
let $c := count(
for $record in json-file("git-archive.json")
let $repo_id := $record.repo.id
group by $repo_id
let $x := [$record.type]
where some $i in $x[] satisfies $i = "DeleteEvent"
where some $i in $x[] satisfies $i = "PushEvent"
return {"id": $repo_id, "events": $x}
)
return $c

#### 2. How many records have the type ForkEvent?

In [None]:
%%jsoniq
count(distinct-values(
    for $record in json-file("git-archive.json")
    where $record.type = "ForkEvent"
    return $record.id
))

#### 3. What is the least common type of event?

In [None]:
%%jsoniq
for $record in json-file("git-archive.json")
let $type := $record.type
group by $type
order by count($record) ascending
return {"type": $type, "count": count($record)}

#### 4. How many PushEvents were there in the repo with the greatest number of ForkEvent?

In [None]:
%%jsoniq
for $record in json-file("git-archive.json")
let $repository := $record.repo.id
let $repository_name := $record.repo.name
group by $repository, $repository_name
let $events := [count($record[$$.type = "PushEvent"]), count($record[$$.type = "ForkEvent"])]
order by $events[[2]] descending
return {"repo": $repository, "PushEvents": $events[[1]], "ForkEvents": $events[[2]], "name": $repository_name}

#### 5. In how many repos did the author Travis CI User commit?

In [None]:
%%jsoniq
count(
    for $record in json-file("git-archive.json")
    for $author in $record.payload.commits[].author.name
    where $author = "Travis CI User"
    let $repo := $record.repo.name
    group by $repo
    order by count($record) descending
    return {"repo": $repo, "count": count($record)}
)

#### 6. When did the last ForkEvent in the repo "bitcoin/bitcoin" happen?

In [None]:
%%jsoniq
for $event in json-file("git-archive.json")
where $event.type = "ForkEvent" and $event.repo.name = "bitcoin/bitcoin"
order by $event.created_at descending
return $event.created_at

#### 7. How many PullRequestEvent were issued in 2018?

In [None]:
%%jsoniq
count(distinct-values(
    for $record in json-file("git-archive.json")
    where $record.type = "PullRequestEvent" and substring($record.created_at, 1, 4) eq "2018"
    return $record.id
))

#### 8. How many events did the author "SLE Merge Robot" commit to the repo "yast/yast-translations"?

In [None]:
%%jsoniq
count(
    for $record in json-file("git-archive.json")
    where $record.repo.name eq "yast/yast-translations"
    for $author in $record.payload.commits[].author.name
    where $author eq "SLE Merge Robot"
    return $author
)

#### 9. What repo has the highest number of commits?

In [None]:
%%jsoniq
for $record in json-file("git-archive.json")
let $repo := $record.repo.name
group by $repo
let $comms := sum(count($record.payload.commits[]))
order by $comms descending
return {"repo": $repo, "comms": $comms}

#### 10. How many repos are there in the dataset?

In [None]:
%%jsoniq
count(distinct-values(
    for $record in json-file("git-archive.json")
    return $record.repo.name
))

#### 11. Give the login name of the two actors that committed to master the most in PushEvent events.

In [None]:
%%jsoniq
for $i in json-file("git-archive-big.json")
where $i.type eq "PushEvent" and $i.payload.ref eq "refs/heads/master"
let $name := $i.actor.login
let $commits := size($i.payload.commits)
group by $name
order by sum($commits) descending
return {"name": $name, "commits": sum($commits)}

#### 12. For how many repos do we have both a creation and deletion event in the data?

In [None]:
%%jsoniq
count(
    for $i in json-file("git-archive-big.json")
    where $i.type eq "CreateEvent" or $i.type eq "DeleteEvent"
    let $repo_id := $i.repo.id
    group by $repo_id
    where count(distinct-values($i.type)) eq 2
    return $repo_id)

In [None]:
%%jsoniq
count(distinct-values(
    for $record in json-file("git-archive-big.json")
    let $repo_id := $record.repo.id
    group by $repo_id
    let $x := [$record.type]
    where some $i in $x[] satisfies $i = "DeleteEvent"
    where some $i in $x[] satisfies $i = "CreateEvent"
    return $repo_id
))

In [None]:
%%jsoniq
count(distinct-values(
    for $repo_push in (
        for $record in json-file("git-archive-big.json")[$$.type = "CreateEvent"]
        return $record.repo.id
    )
    let $repo_push_delete := (
        for $record in json-file("git-archive-big.json")[$$.type = "DeleteEvent"]
        return $record.repo.id
    )[$$ = $repo_push]
    return $repo_push_delete
))

#### 13. Which repository has the highest number of people pushing to it?

In [None]:
%%jsoniq
for $i in json-file("git-archive-big.json")
where $i.type eq "PushEvent"
let $repo_id := $i.repo.id
group by $repo_id
order by count(distinct-values($i.actor.id)) descending
return {"repo_id": $repo_id, "How many people pushed?": count(distinct-values($i.actor.id))}