Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAPI Web Search #7

Closed
jansche opened this issue Feb 3, 2023 · 41 comments
Closed

OpenAPI Web Search #7

jansche opened this issue Feb 3, 2023 · 41 comments

Comments

@jansche
Copy link
Contributor

jansche commented Feb 3, 2023

Summary: Develop an open-source approach to finding Swagger and OpenAPI definitions on the open web, crawling web pages looking for API definitions, validating them, and them consuming and indexing them as part of an ongoing search. Providing a simple way that developers can find APIs that exist by finding documentation, repositories, and other common aspects of running an API.

Skills: Knowledge of the web, and how to crawl web pages, follow URLs, or utilize an existing solution like Common Crawl.

Expected Outcomes: Provide a simple open-source API that abstracts away the complexity of searching the web for specific terms, helping identify APIs in a sea of web pages. Providing a simple interface that will set in motion an asynchronous searching of the web or corpus of web content looking for APIs. Allowing users to initiate a search, but then return regularly to see the results of the search over time, building up results, but then aggregating them for each pulling via simple API.

Possible mentors: @vinitshahdeo + 1-2 additional mentors

Project Repo: https://github.com/postman-open-technologies/openapi-web-search

Size of Project: 175h

Rating: Medium skills level

@jansche
Copy link
Contributor Author

jansche commented Feb 3, 2023

Could this be merged with #6 and become a 175h project? @kinlane

@jansche jansche added ideas question Further information is requested needs refinement Still needs more details to qualify as an application-ready project idea labels Feb 3, 2023
@jansche jansche assigned kinlane and unassigned kinlane Feb 3, 2023
@jansche jansche added final and removed question Further information is requested needs refinement Still needs more details to qualify as an application-ready project idea labels Feb 6, 2023
@BabyElias
Copy link

Hey!
So I have been using Postman for quite some time now (always my go-to for visualising API outputs in the best way possible) and I find this idea really exciting.
Quick Question: This requires knowledge of web-scraping, right? How can I go about discussing about this task with potential mentors and seek guidance regarding the same?

@Prajwalprakash3722
Copy link

Prajwalprakash3722 commented Feb 23, 2023

Hello @jansche, This Project actually looks interesting, please correct me If i understood the statement incorrectly;

I am comparing this with API Marketplaces, the proposed solution aims to help developers find APIs that may not be available on existing API marketplaces like RapidAPI by crawling the web and looking for Swagger and OpenAPI definitions, indexing them, and providing access through a simple API interface. This can make it easier for developers to discover and use APIs that are not part of any marketplace and could be relevant for their specific use case.

The idea of crawling the web to find all the Swagger and OpenAPI definitions out there sounds like a Herculean task. Can you tell me more about how can we plan on making it happen? Are we talking about building an army of web crawlers independently or do you have something else in mind?

Nevertheless Exciting stuff!

@jansche
Copy link
Contributor Author

jansche commented Feb 24, 2023

Hi folks,
we're currently coordinating mentors and will provide more details as well as answer questions beginning of next week (week of February 27). Please bear with us.
Best regards
Jan

@Prajwalprakash3722
Copy link

Cool :)

@ankit-pn
Copy link

ankit-pn commented Feb 25, 2023

Greetings everyone! I'm Ankit Kumar, 3rd year CS student at NIT Bhopal. I am excited about this project and find it to be particularly intriguing. Based on the project summary, it seems like the goal is to create a search engine for OpenAPI and Swagger, which will provide reliable and functional APIs. It is important to validate every OpenAPI ans Swagger definition to ensure its reliability and accuracy.

I have experience using Common Crawl Index and have previously worked with OpenAPI on my own side projects and would love to to contribute to this project as a GSOC mentee this summer.

@ankit-pn
Copy link

I think merging it with #8 will be a good idea for a 175h project!!

@Kd-Here
Copy link

Kd-Here commented Feb 26, 2023

I know how to web crawl and use Common Crawl,
Let's us know when mentor are assigned for the task waiting for it.

@destrex271
Copy link

This idea seems great! Can't wait to work on it.

@ankit-pn
Copy link

ankit-pn commented Mar 1, 2023

Is any mentor assigned to this project yet @jansche ?

@vinitshahdeo
Copy link
Contributor

Hey @ankit-pn, Glad to see you here. I will be mentoring this project.

@Prajwalprakash3722
Copy link

Hey @vinitshahdeo was my assumption correct?

#7 (comment)

@ankit-pn
Copy link

ankit-pn commented Mar 1, 2023

I am glad to see you as mentor @vinitshahdeo .

I do have some of doubt regrading this project

There are 2 ways to get OpenAPI definitions from Open Web

  1. Crawling the web using through different self-made crawlers (spiders)
  2. Using Common Crawl dataset (Common Crawl update its dataset every month)

For both the approach we are required to define a list of sites [eg. apis.guru, github.com , and other sites where there is possibility of getting OpenAPI definitions].

Although we can use whole CommonCrawl dataset to look for OpenAPI definitions [without defining a list of sites], but this dataset is huge(around 300TB) and scraping OpenAPI definitions from this dataset and storing them for building a search engine will be very much computationally expensive imo.

Is there any workaround for this ?

@priyanshu-kun
Copy link

Hey @vinitshahdeo was my assumption correct?

#7 (comment)

I have same question also, please clarify that.

@priyanshu-kun
Copy link

Hey devs,
My name is Priyanshu Sharma and I've done my bachelor's in computer science. I'm really exited about this project and I found this project a perfect match for my current skills. If I got it right the assignment asked us to find swaggers and open API definitions and list them on a frontend web application. Does that application work like a search engine for swaggers and open API definitions?

Overall, the project is really interesting mentor can count on me.

@vinitshahdeo
Copy link
Contributor

Hello everyone,

Glad to see the engagement here. In a nutshell, the idea is to build a search engine for valid API Definitions. Happy to hear thoughts from you all before we share our roadmap. The concrete roadmap will be shared once we create a dedicated repository for the same.

PS: We love your ideas—let's brainstorm! Keep sharing your approaches along with the pros and cons. Heads up! Please think about API First and consider an end-to-end solution from the backend to the user interface.

@priyanshu-kun
Copy link

@vinitshahdeo will you please help me, I feel very conflicted here I mean there are two ways to fetch openAPI definitions, one is web scraping and the second one is a common crawl. where both options have their pros and cons.
Web scraping might be a good option if you only need to extract data from a few websites and have the technical know-how to set up and manage a web scraping solution, web scraping gives more control over data. However, Common Crawl might be a superior option if you need to extract data from a lot of websites while avoiding legal pitfalls but it didn't give much control over data.

@vishvjeet-thakur
Copy link

Hey @vinitshahdeo , myself vishvjeet , I think using common crawl to get the data from most of the websites would be more efficient as we have to find as many openAPI definitions we can and it will save our time also by utilising the already available dataset.

@ankit-pn
Copy link

ankit-pn commented Mar 3, 2023

I think using Common Crawl or using Self made Crawl Bots doesn't makes a lot of difference in complexity of problem that we have to deal. Common crawl itself contain either raw html data or plain text data extracted from those html pages (Using plain text data only makes sense if we have to deal with anything related to NLP) and extracting openapi.yaml/openapi.json files will be easier ( at least for me) from raw html files that extracting it from plain text data.

For me getting raw html data from OpenWeb using self made crawl bots or using Common Crawl , both will be of same complexity but Scraping those html pages for getting openapi definations is real tough deal.

What do you say @vinitshahdeo ? and if there will be any slack or discord channel for further communication on this project, it will be extremely beneficial.

@priyanshu-kun
Copy link

priyanshu-kun commented Mar 3, 2023

@ankit-pn
I think it should be clarified soon as we need to design web app system and write a proposal for the same.

@destrex271
Copy link

@jansche are we supposed to use any specific language for this or is it open for us to choose?

@MikeRalphson
Copy link

@destrex271 the choice of technology stack will be up to the candidates.

@simrann20
Copy link

Hey!

I am Simrann, a postgraduate student in CS and AI from IIITD. I have been using Postman since a long time and am very keen on contributing to it as a GSoC 2023 student. I am quite intrigued by this idea and have clarity on how to plan this project ahead.

Would love to contribute to this as a GSoC 2023 student under the guidance of @vinitshahdeo

@ph1ne4s
Copy link

ph1ne4s commented Mar 9, 2023

Hey everyone! I am Aviral Jain, currently pursuing B.tech(2nd year) at IIT Roorkee and working on projects involving MERN stack, python, and c++.
I am also interested in cybersecurity and robotics.
I have been using postman and would like to contribute to this project under gsoc23.

@monstajoe2002
Copy link

monstajoe2002 commented Mar 9, 2023

Hello everyone, my name is Youssef Amr. I'm currently pursuing a major in Software Engineering and I love building new applications and working on projects, which I hope to do this year.
My experience in programming includes Java, JavaScript, Python, Rust and C++.
I also have a YouTube channel where I showcase some programming content as well.
My interests include tech and web development related things like frameworks and technologies.
I used Postman before and I want to know how to get involved in this GSoC organization possibly with @vinitshahdeo.

@monstajoe2002
Copy link

Can you assign me this issue?

@Rishabh42
Copy link

@KAWALMEET-SINGH would suggest you to be original, plagiarising won't land you anywhere.
I know open source can be a bit tempting but please refrain from such malpractices like copying other's code/comments

Hope you understand

@monstajoe2002
Copy link

I didn't plagiarize anyone's code

@ankit-pn
Copy link

@vinitshahdeo , since this is going to be a completely new project to implement. It would be great if you specify what will be the mandatory qualification task for this project.

@vinitshahdeo
Copy link
Contributor

👋 Hello everyone!

Glad to see all of you engaging here. We created the repo for this project - postman-open-technologies/openapi-web-search and tried our best to answer all of your questions in the README.md.

Please use this thread for any further doubts.

cc/ @ankit-pn @monstajoe2002 @simrann20 @Rishabh42 @ph1ne4s @destrex271 @priyanshu-kun @Prajwalprakash3722 @money8203 @BabyElias

@khsh13
Copy link

khsh13 commented Mar 15, 2023

Hello everyone I am khushi sharma, B.Tech 2nd year student at IGDTUW. I am a MERN stack web developer currently working with postman api . I have done web3 projects also.
I started my journey into Apis using Postman only and this project really excites me. Moreover, i will definitely learn alot getting into this project and working under the guidance of @vinitshahdeo for gsoc 2023.

@MPrashanthR
Copy link

Dear @vinitshahdeo

I am excited to announce that I have completed my training as a Full Stack Developer and I am eager to contribute my skills to open projects. I am particularly interested in the project proposal to develop an open-source approach to finding Swagger and OpenAPI definitions on the open web.

As a Full Stack Developer, I have experience with web development, crawling web pages, and following URLs. I am confident that I have the skills required to create a simple open-source API that will help developers find APIs in a sea of web pages.

I am looking for a mentor @vinitshahdeo @jansche who can guide me through this project and help me learn and grow as a developer. I am committed to putting in the time and effort required to complete this project successfully and contribute to the open source community.

Thank you for considering my application, and I look forward to hearing back from you soon.

@vinitshahdeo
Copy link
Contributor

👋 Hello everyone,

I wrote a public blog post about this project idea - how OAWS can help unleash the power of OpenAPI!

Hope it helps—vinitshahdeo.dev/open-api-web-search

@hemanth9398
Copy link

Dear @vinitshahdeo
I am excited to work for the contribution for the open source for postman.I had worked with CiCd pipelines and worked with the postman for making the http requests for the applications and I had knowledge for working with postman.
I am looking for a mentor @vinitshahdeo @jansche who can guide me through this project and develop complete knowledge in building of the applications with postman. I am committed to putting in the time and effort required to complete this project successfully and contribute to the open source community.

@ankit-pn
Copy link

ankit-pn commented Apr 1, 2023

Greetings everyone! I'm Ankit Kumar, 3rd year CS student at NIT Bhopal. I am excited about this project and find it to be particularly intriguing. Based on the project summary, it seems like the goal is to create a search engine for OpenAPI and Swagger, which will provide reliable and functional APIs. It is important to validate every OpenAPI ans Swagger definition to ensure its reliability and accuracy.

I have experience using Common Crawl Index and have previously worked with OpenAPI on my own side projects and would love to to contribute to this project as a GSOC mentee this summer.

@thelifeofshubh you just copied my whole introduction text. Plagiarism (at least in introducing yourself) will lead you nowhere, so just try to be authentic.

@thelifeofshubh
Copy link

Hey there, @ankit-pn ! I was just curious to know if mentor is active lately. Have you noticed any recent activity from them?

@Prajwalprakash3722
Copy link

Prajwalprakash3722 commented Apr 1, 2023

Hey there, @ankit-pn ! I was just curious to know if mentor is active lately. Have you noticed any recent activity from them?

yes, @MikeRalphson , @vinitshahdeo are pretty active

@DevMukhtarr
Copy link

hey @jansche
As a backend developer who works with APIs alot, i feel this project will really help developers who deal with APIs be it starting a new project or correct issues in their ongoing project, I have experience in web crawling which is one of the main skills required to make this successful, I'll be glad to work on this project.

@LordRona
Copy link

Hello!
My name is Fon Ronard Sauh, a third year major in computer science at the University of Buea. I have worked on projects which required API calls and throughout I used postman. I am really enthusiastic about contributing in postman's open source project and this particular project based on my past exposure to open API Web Search. Under the umbrella of GsoC 2024 as a potential contributor, I am certain to add more value to this project and the team.

@LordRona
Copy link

LordRona commented Dec 1, 2023

Hello @vinitshahdeo please can you recommend me a first issue whilst I am preparing for GSOC 2024. I read the terms of contribution and it listed I get intouch with the main mentor.

@benjagm
Copy link
Collaborator

benjagm commented Feb 20, 2024

Closed as completed as part of 2023 edition.

@benjagm benjagm closed this as completed Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests