OpenAPI Web Search #7

jansche · 2023-02-03T09:53:30Z

Summary: Develop an open-source approach to finding Swagger and OpenAPI definitions on the open web, crawling web pages looking for API definitions, validating them, and them consuming and indexing them as part of an ongoing search. Providing a simple way that developers can find APIs that exist by finding documentation, repositories, and other common aspects of running an API.

Skills: Knowledge of the web, and how to crawl web pages, follow URLs, or utilize an existing solution like Common Crawl.

Expected Outcomes: Provide a simple open-source API that abstracts away the complexity of searching the web for specific terms, helping identify APIs in a sea of web pages. Providing a simple interface that will set in motion an asynchronous searching of the web or corpus of web content looking for APIs. Allowing users to initiate a search, but then return regularly to see the results of the search over time, building up results, but then aggregating them for each pulling via simple API.

Possible mentors: @vinitshahdeo + 1-2 additional mentors

Project Repo: https://github.com/postman-open-technologies/openapi-web-search

Size of Project: 175h

Rating: Medium skills level

jansche · 2023-02-03T09:54:32Z

Could this be merged with #6 and become a 175h project? @kinlane

BabyElias · 2023-02-22T18:41:18Z

Hey!
So I have been using Postman for quite some time now (always my go-to for visualising API outputs in the best way possible) and I find this idea really exciting.
Quick Question: This requires knowledge of web-scraping, right? How can I go about discussing about this task with potential mentors and seek guidance regarding the same?

Prajwalprakash3722 · 2023-02-23T16:06:46Z

Hello @jansche, This Project actually looks interesting, please correct me If i understood the statement incorrectly;

I am comparing this with API Marketplaces, the proposed solution aims to help developers find APIs that may not be available on existing API marketplaces like RapidAPI by crawling the web and looking for Swagger and OpenAPI definitions, indexing them, and providing access through a simple API interface. This can make it easier for developers to discover and use APIs that are not part of any marketplace and could be relevant for their specific use case.

The idea of crawling the web to find all the Swagger and OpenAPI definitions out there sounds like a Herculean task. Can you tell me more about how can we plan on making it happen? Are we talking about building an army of web crawlers independently or do you have something else in mind?

Nevertheless Exciting stuff!

jansche · 2023-02-24T08:07:18Z

Hi folks,
we're currently coordinating mentors and will provide more details as well as answer questions beginning of next week (week of February 27). Please bear with us.
Best regards
Jan

Prajwalprakash3722 · 2023-02-24T13:12:21Z

Cool :)

ankit-pn · 2023-02-25T06:51:45Z

Greetings everyone! I'm Ankit Kumar, 3rd year CS student at NIT Bhopal. I am excited about this project and find it to be particularly intriguing. Based on the project summary, it seems like the goal is to create a search engine for OpenAPI and Swagger, which will provide reliable and functional APIs. It is important to validate every OpenAPI ans Swagger definition to ensure its reliability and accuracy.

I have experience using Common Crawl Index and have previously worked with OpenAPI on my own side projects and would love to to contribute to this project as a GSOC mentee this summer.

ankit-pn · 2023-02-25T07:32:40Z

I think merging it with #8 will be a good idea for a 175h project!!

Kd-Here · 2023-02-26T06:44:51Z

I know how to web crawl and use Common Crawl,
Let's us know when mentor are assigned for the task waiting for it.

destrex271 · 2023-02-28T08:46:11Z

This idea seems great! Can't wait to work on it.

ankit-pn · 2023-03-01T01:35:27Z

Is any mentor assigned to this project yet @jansche ?

vinitshahdeo · 2023-03-01T13:16:54Z

Hey @ankit-pn, Glad to see you here. I will be mentoring this project.

Prajwalprakash3722 · 2023-03-01T14:13:00Z

Hey @vinitshahdeo was my assumption correct?

#7 (comment)

ankit-pn · 2023-03-01T14:45:25Z

I am glad to see you as mentor @vinitshahdeo .

I do have some of doubt regrading this project

There are 2 ways to get OpenAPI definitions from Open Web

Crawling the web using through different self-made crawlers (spiders)
Using Common Crawl dataset (Common Crawl update its dataset every month)

For both the approach we are required to define a list of sites [eg. apis.guru, github.com , and other sites where there is possibility of getting OpenAPI definitions].

Although we can use whole CommonCrawl dataset to look for OpenAPI definitions [without defining a list of sites], but this dataset is huge(around 300TB) and scraping OpenAPI definitions from this dataset and storing them for building a search engine will be very much computationally expensive imo.

Is there any workaround for this ?

priyanshu-kun · 2023-03-01T15:59:26Z

Hey @vinitshahdeo was my assumption correct?

#7 (comment)

I have same question also, please clarify that.

priyanshu-kun · 2023-03-02T06:33:20Z

Hey devs,
My name is Priyanshu Sharma and I've done my bachelor's in computer science. I'm really exited about this project and I found this project a perfect match for my current skills. If I got it right the assignment asked us to find swaggers and open API definitions and list them on a frontend web application. Does that application work like a search engine for swaggers and open API definitions?

Overall, the project is really interesting mentor can count on me.

vinitshahdeo · 2023-03-02T11:27:19Z

Hello everyone,

Glad to see the engagement here. In a nutshell, the idea is to build a search engine for valid API Definitions. Happy to hear thoughts from you all before we share our roadmap. The concrete roadmap will be shared once we create a dedicated repository for the same.

PS: We love your ideas—let's brainstorm! Keep sharing your approaches along with the pros and cons. Heads up! Please think about API First and consider an end-to-end solution from the backend to the user interface.

priyanshu-kun · 2023-03-03T04:15:06Z

@vinitshahdeo will you please help me, I feel very conflicted here I mean there are two ways to fetch openAPI definitions, one is web scraping and the second one is a common crawl. where both options have their pros and cons.
Web scraping might be a good option if you only need to extract data from a few websites and have the technical know-how to set up and manage a web scraping solution, web scraping gives more control over data. However, Common Crawl might be a superior option if you need to extract data from a lot of websites while avoiding legal pitfalls but it didn't give much control over data.

vishvjeet-thakur · 2023-03-03T13:55:29Z

Hey @vinitshahdeo , myself vishvjeet , I think using common crawl to get the data from most of the websites would be more efficient as we have to find as many openAPI definitions we can and it will save our time also by utilising the already available dataset.

ankit-pn · 2023-03-03T14:14:39Z

I think using Common Crawl or using Self made Crawl Bots doesn't makes a lot of difference in complexity of problem that we have to deal. Common crawl itself contain either raw html data or plain text data extracted from those html pages (Using plain text data only makes sense if we have to deal with anything related to NLP) and extracting openapi.yaml/openapi.json files will be easier ( at least for me) from raw html files that extracting it from plain text data.

For me getting raw html data from OpenWeb using self made crawl bots or using Common Crawl , both will be of same complexity but Scraping those html pages for getting openapi definations is real tough deal.

What do you say @vinitshahdeo ? and if there will be any slack or discord channel for further communication on this project, it will be extremely beneficial.

priyanshu-kun · 2023-03-03T16:35:20Z

@ankit-pn
I think it should be clarified soon as we need to design web app system and write a proposal for the same.

destrex271 · 2023-03-04T10:33:39Z

@jansche are we supposed to use any specific language for this or is it open for us to choose?

MikeRalphson · 2023-03-04T10:47:37Z

@destrex271 the choice of technology stack will be up to the candidates.

simrann20 · 2023-03-07T17:31:46Z

Hey!

I am Simrann, a postgraduate student in CS and AI from IIITD. I have been using Postman since a long time and am very keen on contributing to it as a GSoC 2023 student. I am quite intrigued by this idea and have clarity on how to plan this project ahead.

Would love to contribute to this as a GSoC 2023 student under the guidance of @vinitshahdeo

ph1ne4s · 2023-03-09T10:44:24Z

Hey everyone! I am Aviral Jain, currently pursuing B.tech(2nd year) at IIT Roorkee and working on projects involving MERN stack, python, and c++.
I am also interested in cybersecurity and robotics.
I have been using postman and would like to contribute to this project under gsoc23.

monstajoe2002 · 2023-03-09T12:26:23Z

Hello everyone, my name is Youssef Amr. I'm currently pursuing a major in Software Engineering and I love building new applications and working on projects, which I hope to do this year.
My experience in programming includes Java, JavaScript, Python, Rust and C++.
I also have a YouTube channel where I showcase some programming content as well.
My interests include tech and web development related things like frameworks and technologies.
I used Postman before and I want to know how to get involved in this GSoC organization possibly with @vinitshahdeo.

monstajoe2002 · 2023-03-10T12:11:48Z

Can you assign me this issue?

Rishabh42 · 2023-03-10T15:45:32Z

@KAWALMEET-SINGH would suggest you to be original, plagiarising won't land you anywhere.
I know open source can be a bit tempting but please refrain from such malpractices like copying other's code/comments

Hope you understand

monstajoe2002 · 2023-03-10T20:40:51Z

I didn't plagiarize anyone's code

ankit-pn · 2023-03-11T12:34:00Z

@vinitshahdeo , since this is going to be a completely new project to implement. It would be great if you specify what will be the mandatory qualification task for this project.

vinitshahdeo · 2023-03-13T09:49:31Z

👋 Hello everyone!

Glad to see all of you engaging here. We created the repo for this project - postman-open-technologies/openapi-web-search and tried our best to answer all of your questions in the README.md.

Please use this thread for any further doubts.

cc/ @ankit-pn @monstajoe2002 @simrann20 @Rishabh42 @ph1ne4s @destrex271 @priyanshu-kun @Prajwalprakash3722 @money8203 @BabyElias

khsh13 · 2023-03-15T15:30:05Z

Hello everyone I am khushi sharma, B.Tech 2nd year student at IGDTUW. I am a MERN stack web developer currently working with postman api . I have done web3 projects also.
I started my journey into Apis using Postman only and this project really excites me. Moreover, i will definitely learn alot getting into this project and working under the guidance of @vinitshahdeo for gsoc 2023.

MPrashanthR · 2023-03-17T08:41:54Z

Dear @vinitshahdeo

I am excited to announce that I have completed my training as a Full Stack Developer and I am eager to contribute my skills to open projects. I am particularly interested in the project proposal to develop an open-source approach to finding Swagger and OpenAPI definitions on the open web.

As a Full Stack Developer, I have experience with web development, crawling web pages, and following URLs. I am confident that I have the skills required to create a simple open-source API that will help developers find APIs in a sea of web pages.

I am looking for a mentor @vinitshahdeo @jansche who can guide me through this project and help me learn and grow as a developer. I am committed to putting in the time and effort required to complete this project successfully and contribute to the open source community.

Thank you for considering my application, and I look forward to hearing back from you soon.

vinitshahdeo · 2023-03-27T11:29:15Z

👋 Hello everyone,

I wrote a public blog post about this project idea - how OAWS can help unleash the power of OpenAPI!

Hope it helps—vinitshahdeo.dev/open-api-web-search

hemanth9398 · 2023-04-01T03:38:32Z

Dear @vinitshahdeo
I am excited to work for the contribution for the open source for postman.I had worked with CiCd pipelines and worked with the postman for making the http requests for the applications and I had knowledge for working with postman.
I am looking for a mentor @vinitshahdeo @jansche who can guide me through this project and develop complete knowledge in building of the applications with postman. I am committed to putting in the time and effort required to complete this project successfully and contribute to the open source community.

ankit-pn · 2023-04-01T04:46:31Z

Greetings everyone! I'm Ankit Kumar, 3rd year CS student at NIT Bhopal. I am excited about this project and find it to be particularly intriguing. Based on the project summary, it seems like the goal is to create a search engine for OpenAPI and Swagger, which will provide reliable and functional APIs. It is important to validate every OpenAPI ans Swagger definition to ensure its reliability and accuracy.

I have experience using Common Crawl Index and have previously worked with OpenAPI on my own side projects and would love to to contribute to this project as a GSOC mentee this summer.

@thelifeofshubh you just copied my whole introduction text. Plagiarism (at least in introducing yourself) will lead you nowhere, so just try to be authentic.

thelifeofshubh · 2023-04-01T05:06:14Z

Hey there, @ankit-pn ! I was just curious to know if mentor is active lately. Have you noticed any recent activity from them?

Prajwalprakash3722 · 2023-04-01T06:05:02Z

Hey there, @ankit-pn ! I was just curious to know if mentor is active lately. Have you noticed any recent activity from them?

yes, @MikeRalphson , @vinitshahdeo are pretty active

DevMukhtarr · 2023-04-03T20:27:10Z

hey @jansche
As a backend developer who works with APIs alot, i feel this project will really help developers who deal with APIs be it starting a new project or correct issues in their ongoing project, I have experience in web crawling which is one of the main skills required to make this successful, I'll be glad to work on this project.

LordRona · 2023-11-15T04:55:07Z

Hello!
My name is Fon Ronard Sauh, a third year major in computer science at the University of Buea. I have worked on projects which required API calls and throughout I used postman. I am really enthusiastic about contributing in postman's open source project and this particular project based on my past exposure to open API Web Search. Under the umbrella of GsoC 2024 as a potential contributor, I am certain to add more value to this project and the team.

LordRona · 2023-12-01T04:35:05Z

Hello @vinitshahdeo please can you recommend me a first issue whilst I am preparing for GSOC 2024. I read the terms of contribution and it listed I get intouch with the main mentor.

benjagm · 2024-02-20T10:28:13Z

Closed as completed as part of 2023 edition.

jansche mentioned this issue Feb 3, 2023

GitHub OpenAPI Search #8

Closed

jansche added ideas question Further information is requested needs refinement Still needs more details to qualify as an application-ready project idea labels Feb 3, 2023

jansche assigned kinlane and unassigned kinlane Feb 3, 2023

jansche added final and removed question Further information is requested needs refinement Still needs more details to qualify as an application-ready project idea labels Feb 6, 2023

jansche mentioned this issue Feb 6, 2023

Schema.org OpenAPI Catalog #6

Closed

jansche added the OpenAPI label Mar 16, 2023

benjagm closed this as completed Feb 20, 2024

OpenAPI Web Search #7

OpenAPI Web Search #7

Comments

jansche commented Feb 3, 2023 • edited by MikeRalphson Loading

jansche commented Feb 3, 2023

BabyElias commented Feb 22, 2023

Prajwalprakash3722 commented Feb 23, 2023 • edited Loading

jansche commented Feb 24, 2023

Prajwalprakash3722 commented Feb 24, 2023

ankit-pn commented Feb 25, 2023 • edited Loading

ankit-pn commented Feb 25, 2023

Kd-Here commented Feb 26, 2023

destrex271 commented Feb 28, 2023

ankit-pn commented Mar 1, 2023

vinitshahdeo commented Mar 1, 2023

Prajwalprakash3722 commented Mar 1, 2023

ankit-pn commented Mar 1, 2023 • edited Loading

priyanshu-kun commented Mar 1, 2023

priyanshu-kun commented Mar 2, 2023

vinitshahdeo commented Mar 2, 2023

priyanshu-kun commented Mar 3, 2023

vishvjeet-thakur commented Mar 3, 2023

ankit-pn commented Mar 3, 2023 • edited Loading

priyanshu-kun commented Mar 3, 2023 • edited Loading

destrex271 commented Mar 4, 2023

MikeRalphson commented Mar 4, 2023

simrann20 commented Mar 7, 2023

ph1ne4s commented Mar 9, 2023

monstajoe2002 commented Mar 9, 2023 • edited Loading

monstajoe2002 commented Mar 10, 2023

Rishabh42 commented Mar 10, 2023

monstajoe2002 commented Mar 10, 2023

ankit-pn commented Mar 11, 2023

vinitshahdeo commented Mar 13, 2023

khsh13 commented Mar 15, 2023

MPrashanthR commented Mar 17, 2023

vinitshahdeo commented Mar 27, 2023

hemanth9398 commented Apr 1, 2023

ankit-pn commented Apr 1, 2023

thelifeofshubh commented Apr 1, 2023

Prajwalprakash3722 commented Apr 1, 2023 • edited Loading

DevMukhtarr commented Apr 3, 2023

LordRona commented Nov 15, 2023

LordRona commented Dec 1, 2023

benjagm commented Feb 20, 2024

jansche commented Feb 3, 2023 •

edited by MikeRalphson

Loading

Prajwalprakash3722 commented Feb 23, 2023 •

edited

Loading

ankit-pn commented Feb 25, 2023 •

edited

Loading

ankit-pn commented Mar 1, 2023 •

edited

Loading

ankit-pn commented Mar 3, 2023 •

edited

Loading

priyanshu-kun commented Mar 3, 2023 •

edited

Loading

monstajoe2002 commented Mar 9, 2023 •

edited

Loading

Prajwalprakash3722 commented Apr 1, 2023 •

edited

Loading