Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store a list of current repos which use the JSON Schema topic and when they were created #4 #10

Merged
merged 26 commits into from
Jun 21, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
b86c960
chore: Initiate GSoC qualification task, fix package.json and add CSV…
wcj617 Mar 22, 2024
d37ad55
feat#4: Retrieve and store date of first commit in CSV
wcj617 Mar 24, 2024
339d98c
refactor: Remove Wayback Machine checks and related data recording
wcj617 Mar 24, 2024
8dde501
chore: Re-add .gitignore file
wcj617 Mar 25, 2024
8a46389
fix: Correct eslint:fix script in package.json
wcj617 Mar 25, 2024
f96535c
test: Add integration tests for Github API calls and file writing
wcj617 Mar 27, 2024
483a164
feat: Enhance initial data script with error handling and additional …
wcj617 Mar 27, 2024
3f9c9fb
Delete projects/initial-data/initialTopicRepoData-1711072195857.csv
wcj617 Mar 27, 2024
584c825
Delete projects/initial-data/initialTopicRepoData-1711256938624.csv
wcj617 Mar 27, 2024
0786fde
Delete projects/initial-data/initialTopicRepoData-1711263053033.csv
wcj617 Mar 27, 2024
6761a8e
Fix integration test: Resolve Unix timestamp conversion error
wcj617 Mar 28, 2024
543ebf0
chore: Initiate GSoC qualification task, fix package.json and add CSV…
wcj617 Mar 22, 2024
2849433
feat#4: Retrieve and store date of first commit in CSV
wcj617 Mar 24, 2024
a2ca6ef
refactor: Remove Wayback Machine checks and related data recording
wcj617 Mar 24, 2024
cbbe873
chore: Re-add .gitignore file
wcj617 Mar 25, 2024
4f64dfd
fix: Correct eslint:fix script in package.json
wcj617 Mar 25, 2024
30a8a8c
test: Add integration tests for Github API calls and file writing
wcj617 Mar 27, 2024
af19955
feat: Enhance initial data script with error handling and additional …
wcj617 Mar 27, 2024
9e63001
Delete projects/initial-data/initialTopicRepoData-1711072195857.csv
wcj617 Mar 27, 2024
860921a
Delete projects/initial-data/initialTopicRepoData-1711256938624.csv
wcj617 Mar 27, 2024
0e0dd5b
Delete projects/initial-data/initialTopicRepoData-1711263053033.csv
wcj617 Mar 27, 2024
b5e2c96
Fix integration test: Resolve Unix timestamp conversion error
wcj617 Mar 28, 2024
faecfd7
Fix improper error handling
wcj617 May 24, 2024
3dd0683
Merge remote-tracking branch 'origin'
wcj617 May 24, 2024
8f8b2cb
Improve testing by checking a file is created wit hthe correct value …
Relequestual Jun 21, 2024
f02866c
Merge branch 'main' into main
Relequestual Jun 21, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
145 changes: 0 additions & 145 deletions projects/initial-data/.gitignore
wcj617 marked this conversation as resolved.
Show resolved Hide resolved

This file was deleted.

2 changes: 1 addition & 1 deletion projects/initial-data/dataRecorder.js
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ export class DataRecorder {
if (!fs.existsSync(this.fileName)) {
fs.writeFileSync(
this.fileName,
'repo,repo_topics,creation,archive_url_creation,topic_present_creation,release,archive_url_release,topic_present_release\n',
'repo,repo_topics,date_first_commit,creation,release\n',
'utf8',
);
}
Expand Down
11 changes: 11 additions & 0 deletions projects/initial-data/initialTopicRepoData-1711072195857.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
repo,repo_topics,creation,archive_url_creation,topic_present_creation,release,archive_url_release,topic_present_release
tiangolo/fastapi,"python,json,swagger-ui,redoc,starlette,openapi,api,openapi3,framework,async,asyncio,uvicorn,python3,python-types,pydantic,json-schema,fastapi,swagger,rest,web",1544257307000,http://web.archive.org/web/20190204230405/https://github.com/tiangolo/fastapi,true,1544896765000,http://web.archive.org/web/20190204230405/https://github.com/tiangolo/fastapi,true
tiangolo/full-stack-fastapi-template,"python,json,json-schema,docker,postgresql,frontend,backend,fastapi,traefik,letsencrypt,swagger,jwt,openapi,chakra-ui,react,tanstack-query,tanstack-router,typescript,sqlmodel",1550934514000,http://web.archive.org/web/20240312213829/https://github.com/tiangolo/full-stack-fastapi-template,true,1552297741000,http://web.archive.org/web/20240312213829/https://github.com/tiangolo/full-stack-fastapi-template,true
pydantic/pydantic,"validation,parsing,json-schema,python37,python38,pydantic,python39,python,hints,python310,python311,python312",1493846638000,http://web.archive.org/web/20220808202853/https://github.com/pydantic/pydantic,true,1494082138000,http://web.archive.org/web/20220808202853/https://github.com/pydantic/pydantic,true
rjsf-team/react-jsonschema-form,"react,json-schema,forms,ui,web,json,data-validation",1450278845000,http://web.archive.org/web/20191003021746/https://github.com/rjsf-team/react-jsonschema-form,true,1450371513000,http://web.archive.org/web/20191003021746/https://github.com/rjsf-team/react-jsonschema-form,true
ajv-validator/ajv,"json-schema,validator,ajv",1432077812000,http://web.archive.org/web/20200515085719/https://github.com/ajv-validator/ajv,true,1434196865000,http://web.archive.org/web/20200515085719/https://github.com/ajv-validator/ajv,true
tiangolo/sqlmodel,"python,sql,sqlalchemy,pydantic,fastapi,json,json-schema",1629815213000,http://web.archive.org/web/20210824225156/https://github.com/tiangolo/sqlmodel,true,1629829555000,http://web.archive.org/web/20210824225156/https://github.com/tiangolo/sqlmodel,true
glideapps/quicktype,"json,typescript,json-schema,elm,java,swift,graphql,csharp,cplusplus,golang,objective-c,rust,kotlin",1499905370000,http://web.archive.org/web/20231011105817/https://github.com/glideapps/quicktype,true
alibaba/formily,"react,json-schema,form,validator,observable,reactive,schema-form,fusion,ant-design,vue,vue3,designable,react-native,json-schema-form,low-code,no-code,react-form,vue-form,form-builder",1546999905000,http://web.archive.org/web/20200514222830/https://github.com/alibaba/formily,true,1556265000000,http://web.archive.org/web/20200514222830/https://github.com/alibaba/formily,true
alibaba/x-render,"javascript,react,ant-design,json-schema,formrender,widget,webpack,ant,typescript,table,chart,list,form",1569484706000,http://web.archive.org/web/20210606023404/https://github.com/alibaba/x-render,true,1571190033000,http://web.archive.org/web/20210606023404/https://github.com/alibaba/x-render,true
joelittlejohn/jsonschema2pojo,"java,json-schema,json,jackson,gson,maven-plugin,ant-task,gradle-plugin",1371940133000,http://web.archive.org/web/20141222093656/https://github.com/joelittlejohn/jsonschema2pojo,false,1341268929000,http://web.archive.org/web/20141222093656/https://github.com/joelittlejohn/jsonschema2pojo,false
11 changes: 11 additions & 0 deletions projects/initial-data/initialTopicRepoData-1711256938624.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
repo,repo_topics,date_first_commit,creation,archive_url_creation,topic_present_creation,release,archive_url_release,topic_present_release
tiangolo/fastapi,"python,json,swagger-ui,redoc,starlette,openapi,api,openapi3,framework,async,asyncio,uvicorn,python3,python-types,pydantic,json-schema,fastapi,swagger,rest,web",2018-12-05T06:56:50Z,1544257307000,http://web.archive.org/web/20190204230405/https://github.com/tiangolo/fastapi,true,1544896765000,http://web.archive.org/web/20190204230405/https://github.com/tiangolo/fastapi,true
tiangolo/full-stack-fastapi-template,"python,json,json-schema,docker,postgresql,frontend,backend,fastapi,traefik,letsencrypt,swagger,jwt,openapi,chakra-ui,react,tanstack-query,tanstack-router,typescript,sqlmodel",2019-02-09T15:42:36Z,1550934514000,http://web.archive.org/web/20240312213829/https://github.com/tiangolo/full-stack-fastapi-template,true,1552297741000,http://web.archive.org/web/20240312213829/https://github.com/tiangolo/full-stack-fastapi-template,true
pydantic/pydantic,"validation,parsing,json-schema,python37,python38,pydantic,python39,python,hints,python310,python311,python312",2017-05-03T21:23:41Z,1493846638000,http://web.archive.org/web/20220808202853/https://github.com/pydantic/pydantic,true,1494082138000,http://web.archive.org/web/20220808202853/https://github.com/pydantic/pydantic,true
rjsf-team/react-jsonschema-form,"react,json-schema,forms,ui,web,json,data-validation",2015-12-16T15:16:14Z,1450278845000,http://web.archive.org/web/20191003021746/https://github.com/rjsf-team/react-jsonschema-form,true,1450371513000,http://web.archive.org/web/20191003021746/https://github.com/rjsf-team/react-jsonschema-form,true
ajv-validator/ajv,"json-schema,validator,ajv",2015-05-19T23:23:32Z,1432077812000,http://web.archive.org/web/20200515085719/https://github.com/ajv-validator/ajv,true,1434196865000,http://web.archive.org/web/20200515085719/https://github.com/ajv-validator/ajv,true
tiangolo/sqlmodel,"python,sql,sqlalchemy,pydantic,fastapi,json,json-schema",2021-08-24T12:41:53Z,1629815213000,http://web.archive.org/web/20210824225156/https://github.com/tiangolo/sqlmodel,true,1629829555000,http://web.archive.org/web/20210824225156/https://github.com/tiangolo/sqlmodel,true
glideapps/quicktype,"json,typescript,json-schema,elm,java,swift,graphql,csharp,cplusplus,golang,objective-c,rust,kotlin",2017-07-13T00:22:10Z,1499905370000,http://web.archive.org/web/20231011105817/https://github.com/glideapps/quicktype,true
alibaba/formily,"react,json-schema,form,validator,observable,reactive,schema-form,fusion,ant-design,vue,vue3,designable,react-native,json-schema-form,low-code,no-code,react-form,vue-form,form-builder",2019-02-21T02:16:13Z,1546999905000,http://web.archive.org/web/20200514222830/https://github.com/alibaba/formily,true,1556265000000,http://web.archive.org/web/20200514222830/https://github.com/alibaba/formily,true
alibaba/x-render,"javascript,react,ant-design,json-schema,formrender,widget,webpack,ant,typescript,table,chart,list,form",2019-09-26T07:58:27Z,1569484706000,http://web.archive.org/web/20210606023404/https://github.com/alibaba/x-render,true,1571190033000,http://web.archive.org/web/20210606023404/https://github.com/alibaba/x-render,true
joelittlejohn/jsonschema2pojo,"java,json-schema,json,jackson,gson,maven-plugin,ant-task,gradle-plugin",2010-12-10T12:04:35Z,1371940133000,http://web.archive.org/web/20141222093656/https://github.com/joelittlejohn/jsonschema2pojo,false,1341268929000,http://web.archive.org/web/20141222093656/https://github.com/joelittlejohn/jsonschema2pojo,false
11 changes: 11 additions & 0 deletions projects/initial-data/initialTopicRepoData-1711263053033.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
repo,repo_topics,date_first_commit,creation,release
tiangolo/fastapi,"python, json, swagger-ui, redoc, starlette, openapi, api, openapi3, framework, async, asyncio, uvicorn, python3, python-types, pydantic, json-schema, fastapi, swagger, rest, web",2018-12-05T06:56:50Z,2018-12-08T08:21:47Z,2018-12-15T17:59:25Z
tiangolo/full-stack-fastapi-template,"python, json, json-schema, docker, postgresql, frontend, backend, fastapi, traefik, letsencrypt, swagger, jwt, openapi, chakra-ui, react, tanstack-query, tanstack-router, typescript, sqlmodel",2019-02-09T15:42:36Z,2019-02-23T15:08:34Z,2019-03-11T09:49:01Z
pydantic/pydantic,"validation, parsing, json-schema, python37, python38, pydantic, python39, python, hints, python310, python311, python312",2017-05-03T21:23:41Z,2017-05-03T21:23:58Z,2017-05-06T14:48:58Z
rjsf-team/react-jsonschema-form,"react, json-schema, forms, ui, web, json, data-validation",2015-12-16T15:16:14Z,2015-12-16T15:14:05Z,2015-12-17T16:58:33Z
ajv-validator/ajv,"json-schema, validator, ajv",2015-05-19T23:23:32Z,2015-05-19T23:23:32Z,2015-06-13T12:01:05Z
tiangolo/sqlmodel,"python, sql, sqlalchemy, pydantic, fastapi, json, json-schema",2021-08-24T12:41:53Z,2021-08-24T14:26:53Z,2021-08-24T18:25:55Z
glideapps/quicktype,"json, typescript, json-schema, elm, java, swift, graphql, csharp, cplusplus, golang, objective-c, rust, kotlin",2017-07-13T00:22:10Z,2017-07-13T00:22:50Z,
alibaba/formily,"react, json-schema, form, validator, observable, reactive, schema-form, fusion, ant-design, vue, vue3, designable, react-native, json-schema-form, low-code, no-code, react-form, vue-form, form-builder",2019-02-21T02:16:13Z,2019-01-09T02:11:45Z,2019-04-26T07:50:00Z
alibaba/x-render,"javascript, react, ant-design, json-schema, formrender, widget, webpack, ant, typescript, table, chart, list, form",2019-09-26T07:58:27Z,2019-09-26T07:58:26Z,2019-10-16T01:40:33Z
joelittlejohn/jsonschema2pojo,"java, json-schema, json, jackson, gson, maven-plugin, ant-task, gradle-plugin",2010-12-10T12:04:35Z,2013-06-22T22:28:53Z,2012-07-02T22:42:09Z
107 changes: 35 additions & 72 deletions projects/initial-data/main.js
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
import { Octokit } from 'octokit';
import cheerio from 'cheerio';
import { getInput } from './setup.js';

import { DataRecorder } from './dataRecorder.js';

const WAYBACK_API_URL = 'http://archive.org/wayback/available';
const CSV_FILE_NAME = `initialTopicRepoData-${Date.now()}.csv`;

async function fetchRepoCreationDate(octokit, owner, repo) {
Expand All @@ -16,6 +14,26 @@ async function fetchRepoCreationDate(octokit, owner, repo) {
return response.data.created_at;
}

async function fetchFirstCommitDate(octokit, owner, repo) {
console.log(`Fetching first commit date for repository: ${owner}/${repo}`);
const response = await octokit.request('GET /repos/{owner}/{repo}/commits', {
owner,
repo,
per_page: 1,
});

const lastPageUrl = response.headers.link?.match(
/<([^>]+)>;\s*rel="last"/,
)?.[1];

if (!lastPageUrl) {
return response.data.length > 0 ? response.data[0].commit.author.date : null;
wcj617 marked this conversation as resolved.
Show resolved Hide resolved
}

const lastPageResponse = await octokit.request(lastPageUrl);
return lastPageResponse.data.length > 0 ? lastPageResponse.data[0].commit.author.date : null;
wcj617 marked this conversation as resolved.
Show resolved Hide resolved
}

async function fetchRepoTopics(octokit, owner, repo) {
console.log(`Fetching topics for repository: ${owner}/${repo}`);
const response = await octokit.request('GET /repos/{owner}/{repo}/topics', {
Expand Down Expand Up @@ -46,84 +64,30 @@ async function fetchFirstReleaseDate(octokit, owner, repo) {
: null;
}

async function fetchWaybackSnapshot(url, timestamp) {
console.log(
`Fetching Wayback Machine snapshot for URL: ${url} at timestamp: ${timestamp}`,
);
console.log(`${WAYBACK_API_URL}?url=${url}&timestamp=${timestamp}`);
const response = await fetch(
`${WAYBACK_API_URL}?url=${url}&timestamp=${timestamp}`,
);
const data = await response.json();
return data.archived_snapshots;
}

async function checkTopicInPage(url, topic) {
console.log(`Checking if topic "${topic}" exists in page: ${url}`);
const response = await fetch(url);
const html = await response.text();
const $ = cheerio.load(html);
return $(`a.topic-tag-link:contains('${topic}')`).length > 0;
}

async function processRepository(octokit, owner, repo, topic) {
async function processRepository(octokit, owner, repo) {
console.log(`Processing repository: ${owner}/${repo}`);
const githubRepoURL = `https://github.com/${owner}/${repo}`;

const creationDate = await fetchRepoCreationDate(octokit, owner, repo);
const firstReleaseDate = await fetchFirstReleaseDate(octokit, owner, repo);
const repoTopics = await fetchRepoTopics(octokit, owner, repo);
const firstCommitDate = await fetchFirstCommitDate(octokit, owner, repo);
console.log({ firstReleaseDate });
if (firstReleaseDate === null) {
console.log(`First release date: of ${githubRepoURL} unknown`);
}

const dateTypes = [
['creation', creationDate],
...(firstReleaseDate !== null ? [['release', firstReleaseDate]] : []),
];
console.log({ dateTypes });

const dataSets = dateTypes.map(async ([dateType, isoDate]) => {
if (isoDate) {
console.log(`Processing ${dateType} date: ${isoDate}`);
const date = new Date(isoDate);
const datestamp = date.getTime();
const archivedSnapshots = await fetchWaybackSnapshot(
githubRepoURL,
datestamp,
);
if (Object.keys(archivedSnapshots).length === 0) {
console.log(`Unable to find archive for ${githubRepoURL}`);
} else {
const archiveUrl = archivedSnapshots.closest.url;
if (archiveUrl) {
const topicExists = await checkTopicInPage(archiveUrl, topic);
return {
[`datestamp_${dateType}`]: datestamp,
[`archiveUrl_${dateType}`]: archiveUrl,
[`topicExists_${dateType}`]: topicExists,
};
} else {
console.error(
`Couldn't get closest archive URL given response from ${githubRepoURL}`,
);
}
}
}
});

const combinedData = await Promise.all(dataSets);

const singleRowData = combinedData.reduce(
(acc, cur) => {
if (cur) {
return { ...acc, ...cur };
}
return acc;
},
{ repository: `${owner}/${repo}`, repoTopics: `"${repoTopics.join(',')}"` },
);

if (firstCommitDate === null) {
console.log(`First commit date: of ${githubRepoURL} unknown`);
}

const singleRowData = {
repository: `${owner}/${repo}`,
repoTopics: `"${repoTopics.join(', ')}"`,
date_first_commit: firstCommitDate,
creation: creationDate,
release: firstReleaseDate,
};

return singleRowData;
}
Expand All @@ -148,7 +112,6 @@ async function main(token, topic, numRepos) {
octokit,
repo.owner.login,
repo.name,
topic,
);
console.log({ dataRow });
dataRecorder.appendToCSV(Object.values(dataRow));
Expand All @@ -165,4 +128,4 @@ console.log(
`Starting process with token: REDACTED, topic: ${topic}, numRepos: ${numRepos}`,
);

main(token, topic, numRepos);
main(token, topic, numRepos);
2 changes: 1 addition & 1 deletion projects/initial-data/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1",
"eslint": "eslint . --ext js",
"eslint:fix": "pnpm run eslint -- --fix"
wcj617 marked this conversation as resolved.
Show resolved Hide resolved
"eslint:fix": "eslint . --ext js --fix"
},
"keywords": [],
"author": "Ben Hutton",
Expand Down
11 changes: 11 additions & 0 deletions projects/initial-data/processed_data.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
0 1
1371940133 1
1432077812 1
1450278845 1
1493846638 1
1499905370 1
1544257307 1
1546999905 1
1550934514 1
1569484706 1
1629815213 1
Loading