Skip to content

Commit

Permalink
Merge branch 'main' into Update-API
Browse files Browse the repository at this point in the history
  • Loading branch information
ishaan812 committed Nov 5, 2023
2 parents 8bb09ec + aa559b9 commit 90f08e3
Show file tree
Hide file tree
Showing 6 changed files with 91 additions and 4 deletions.
58 changes: 54 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,63 @@


# GitHub OpenAPI Search

The goal of this project is to provide a robust yet easy way to search Github for Swagger and OpenAPI definitions. Understanding that there is a lot of noise available, that we only care about OpenAPIs that validate, and that the Github API has rate limits that require you to automate the crawling over time. Providing a robust open-source solution that will crawl public Github repositories for machine-readable API definitions.
The project will consist of developing an open-source API that allows you to pass in search parameters and then utilize the GitHub API to perform the search, helping simplify the search interface, make rate limits visible as part of the response, and handle conducting a search in an asynchronous way, allowing the user to make a call to initiate, but then separate calls to receive results over time as results come in, helping show outcomes over time.
The goal of this project is to provide a robust yet easy way to search Github for OpenAPI and Swagger definitions. Understanding that there is a lot of noise available, that we only care about OpenAPIs that validate, and that the Github API has rate limits that require you to automate the crawling over time. Providing a robust open-source solution that will crawl public Github repositories for machine-readable API definitions.
The project will consist of developing an open-source API that allows you to pass in search parameters and then utilize the GitHub API to perform the search, helping simplify the search interface, and handle conducting a search in an asynchronous way, allowing the user to make a call to initiate, but then separate calls to receive results over time as results come in, helping show outcomes over time.

## Tech Stack

- Node JS/Express JS
- Typescript
- Octokit.JS
- Jest
- Jest (For testing)
- Docker
- Python (Scripting)
- ElasticSearch

## Dev Runbook
Dependancies: NodeJS 19, npm, Github APIKey
How to get a Github API Key: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens

## Setting up OpenAPI Search with Docker Compose

1. Clone the repository to your local setup
2. Make sure you have Docker installed locally.
3. Run `docker compose up`
4. Two Containers - Elasticsearch (The database container) and an instance of the server should have started.
5. Now to load the database with OpenAPI Files, run
`python scripts/seed_script.py` from the root of the folder. (Takes around 2-3hrs)
(More configuration of organisation list you can edit the scripts/assets/organisations1.txt, scripts/assets/organisations2.txt is for the next 1000 organisations)

## Setting up the server manually

1. Clone the repository to your local setup
2. Run `npm i`
3. Make a `.env` file in the directory and add the variables:
**PORT**= **(port number you want to host the api)**
**GITHUB_API_KEY**= **(github API key)**
**ES_HOST**= **(determines location of elasticsearch db)**
4. Run `npm run build:watch` on one terminal.
5. On another terminal, run `npm run start` to start the server on the port specified on.
6. Now the nodejs server should be running! To test it just go to `localhost:{{PORT}}` and then you will be able to see the admin panel through which you can inference with some of the API's
7. Now to load the database with OpenAPI Files, run
`python scripts/seed_script.py` from the root of the folder. (Takes around 2-3hrs)

## Setting up ElasticSearch locally (Manually)
1. docker pull docker.elastic.co/elasticsearch/elasticsearch:8.8.2
2. docker network create elastic
3. docker run \
-p 9200:9200 \
-p 9300:9300 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
docker.elastic.co/elasticsearch/elasticsearch:8.8.2

## Loading Details
Currently, we are only indexing OpenAPI Files from the top 1000 most popular organisations from Github (Based on stars). Although more organisations can be indexed by adding them to the `scripts/assets/organisations.txt` file.


## API Endpoints
[![Run in Postman](https://run.pstmn.io/button.svg)](https://app.getpostman.com/run-collection/19841716-f1801bb7-b189-429b-a875-91b115d349a2?action=collection%2Ffork&source=rip_markdown&collection-url=entityId%3D19841716-f1801bb7-b189-429b-a875-91b115d349a2%26entityType%3Dcollection%26workspaceId%3D5ebe19fb-61d4-47a7-9cae-de3834853f6b)


🚧Under Construction
2 changes: 2 additions & 0 deletions scripts/seed_script.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
def call_local_endpoint(prompt):
#TODO: Change this to the correct URL when activesearch endpoint is changed
url = f'http://localhost:8080/database?rootquery="{prompt}"'


try:
response = requests.get(url)
Expand All @@ -28,4 +29,5 @@ def call_local_endpoint(prompt):
#Get Swagger files
# call_local_endpoint('"swagger: \"2"')


#PS: Takes a long time to run
1 change: 1 addition & 0 deletions src/DB/dbutils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -74,3 +74,4 @@ export async function GetDocumentWithId(Id:string): Promise<any> {
}
}


25 changes: 25 additions & 0 deletions src/app.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ const octokit = new CustomOctokit({
`Request quota exhausted for request ${options.method} ${options.url}`,
);
console.info(`Retrying after ${retryAfter} seconds!`);

return true;
},
onSecondaryRateLimit: (retryAfter, options, octokit) => {
Expand Down Expand Up @@ -47,6 +48,27 @@ app.get('/search', async (_req, _res) => {

//openapi2db
app.post('/openapi', async (_req, _res) => {
=======


const esClient = new es.Client({
host: 'http://localhost:9200',
log: 'trace',
});

// Should not even be an API endpoint for passive search
// Should just fetch repositories and go through them (with ETAG to make sure no repeats)
// Check for openapi.json in the contents of the repository
// If it exists, then store in database with important content


app.use('/passive', async (_req, _res) => {
const query = _req.query.q as string;
const results = await passiveSearch(query, esClient);
_res.send(results);
})

app.use('/search', async (_req, _res) => {
const Repository = _req.query.repo as string;
const Organisation = _req.query.org as string;
const User = _req.query.user as string;
Expand All @@ -58,10 +80,12 @@ app.post('/openapi', async (_req, _res) => {
Organisation as string,
User as string,
RootQuery as string,
esClient as any,
);
_res.send(results);
});


app.put('/openapi', async (_req, _res) => {
const results = await UpdateOpenAPIFiles();
_res.send(results);
Expand All @@ -72,6 +96,7 @@ app.use('/ping', async (_req, _res) => {
_res.send(response);
});


app.get('/', (_req, _res) => {
_res.send('TypeScript With Express');
});
Expand Down
6 changes: 6 additions & 0 deletions src/searchtools/search.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import { octokit, esClient } from '../app.js';
let processCount = 0;
let finishedCount = 0;


export async function activeSearch(
prompt: string,
repo: string,
Expand Down Expand Up @@ -68,6 +69,7 @@ export async function activeSearch(
finishedCount,
);
console.info('Waiting for all files to be processed');

}
return validFiles;
}
Expand All @@ -85,6 +87,7 @@ export async function passiveSearch(
query: {
simple_query_string: {
query: query,

fields: ['title^3', 'servers^2', 'paths^1.5', 'data^1'],
default_operator: 'and',
},
Expand All @@ -100,13 +103,16 @@ export async function passiveSearch(
}
} catch (error) {
if (error.message.includes('No Living connections')) {

console.error('Elasticsearch connection error:', error);
return error;
} else {
console.error('Error occurred during passive search:', error);
return error;

}
}

return 'Database not found';
}

3 changes: 3 additions & 0 deletions src/searchtools/searchutils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import OASNormalize from 'oas-normalize';
import { BulkStoreToDB } from '../DB/dbutils.js';

export function generateUUID(): string {

// Generate a random buffer of 16 bytes
const buffer = crypto.randomBytes(16);

Expand Down Expand Up @@ -67,6 +68,7 @@ export async function queryBuilder(
return query;
}


export async function ValidateandStoreFiles(
files: any[],
): Promise<any> {
Expand Down Expand Up @@ -122,5 +124,6 @@ export async function ValidateandStoreFiles(
});
}
BulkStoreToDB(validFiles as any[]);

return validFiles;
}

0 comments on commit 90f08e3

Please sign in to comment.