-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenAPI Web Search #7
Comments
Hey! |
Hello @jansche, This Project actually looks interesting, please correct me If i understood the statement incorrectly; I am comparing this with API Marketplaces, the proposed solution aims to help developers find APIs that may not be available on existing API marketplaces like RapidAPI by crawling the web and looking for Swagger and OpenAPI definitions, indexing them, and providing access through a simple API interface. This can make it easier for developers to discover and use APIs that are not part of any marketplace and could be relevant for their specific use case. The idea of crawling the web to find all the Swagger and OpenAPI definitions out there sounds like a Herculean task. Can you tell me more about how can we plan on making it happen? Are we talking about building an army of web crawlers independently or do you have something else in mind? Nevertheless Exciting stuff! |
Hi folks, |
Cool :) |
Greetings everyone! I'm Ankit Kumar, 3rd year CS student at NIT Bhopal. I am excited about this project and find it to be particularly intriguing. Based on the project summary, it seems like the goal is to create a search engine for OpenAPI and Swagger, which will provide reliable and functional APIs. It is important to validate every OpenAPI ans Swagger definition to ensure its reliability and accuracy. I have experience using Common Crawl Index and have previously worked with OpenAPI on my own side projects and would love to to contribute to this project as a GSOC mentee this summer. |
I think merging it with #8 will be a good idea for a 175h project!! |
I know how to web crawl and use Common Crawl, |
This idea seems great! Can't wait to work on it. |
Is any mentor assigned to this project yet @jansche ? |
Hey @ankit-pn, Glad to see you here. I will be mentoring this project. |
Hey @vinitshahdeo was my assumption correct? |
I am glad to see you as mentor @vinitshahdeo . I do have some of doubt regrading this project There are 2 ways to get OpenAPI definitions from Open Web
For both the approach we are required to define a list of sites [eg. apis.guru, github.com , and other sites where there is possibility of getting OpenAPI definitions]. Although we can use whole CommonCrawl dataset to look for OpenAPI definitions [without defining a list of sites], but this dataset is huge(around 300TB) and scraping OpenAPI definitions from this dataset and storing them for building a search engine will be very much computationally expensive imo. Is there any workaround for this ? |
I have same question also, please clarify that. |
Hey devs, Overall, the project is really interesting mentor can count on me. |
Hello everyone, Glad to see the engagement here. In a nutshell, the idea is to build a search engine for valid API Definitions. Happy to hear thoughts from you all before we share our roadmap. The concrete roadmap will be shared once we create a dedicated repository for the same. PS: We love your ideas—let's brainstorm! Keep sharing your approaches along with the pros and cons. Heads up! Please think about API First and consider an end-to-end solution from the backend to the user interface. |
@vinitshahdeo will you please help me, I feel very conflicted here I mean there are two ways to fetch openAPI definitions, one is web scraping and the second one is a common crawl. where both options have their pros and cons. |
Hey @vinitshahdeo , myself vishvjeet , I think using common crawl to get the data from most of the websites would be more efficient as we have to find as many openAPI definitions we can and it will save our time also by utilising the already available dataset. |
I think using Common Crawl or using Self made Crawl Bots doesn't makes a lot of difference in complexity of problem that we have to deal. Common crawl itself contain either raw html data or plain text data extracted from those html pages (Using plain text data only makes sense if we have to deal with anything related to NLP) and extracting openapi.yaml/openapi.json files will be easier ( at least for me) from raw html files that extracting it from plain text data. For me getting raw html data from OpenWeb using self made crawl bots or using Common Crawl , both will be of same complexity but Scraping those html pages for getting openapi definations is real tough deal. What do you say @vinitshahdeo ? and if there will be any slack or discord channel for further communication on this project, it will be extremely beneficial. |
@ankit-pn |
@jansche are we supposed to use any specific language for this or is it open for us to choose? |
@destrex271 the choice of technology stack will be up to the candidates. |
Hey! I am Simrann, a postgraduate student in CS and AI from IIITD. I have been using Postman since a long time and am very keen on contributing to it as a GSoC 2023 student. I am quite intrigued by this idea and have clarity on how to plan this project ahead. Would love to contribute to this as a GSoC 2023 student under the guidance of @vinitshahdeo |
Hey everyone! I am Aviral Jain, currently pursuing B.tech(2nd year) at IIT Roorkee and working on projects involving MERN stack, python, and c++. |
Hello everyone, my name is Youssef Amr. I'm currently pursuing a major in Software Engineering and I love building new applications and working on projects, which I hope to do this year. |
Can you assign me this issue? |
@KAWALMEET-SINGH would suggest you to be original, plagiarising won't land you anywhere. Hope you understand |
I didn't plagiarize anyone's code |
@vinitshahdeo , since this is going to be a completely new project to implement. It would be great if you specify what will be the mandatory qualification task for this project. |
👋 Hello everyone! Glad to see all of you engaging here. We created the repo for this project - postman-open-technologies/openapi-web-search and tried our best to answer all of your questions in the Please use this thread for any further doubts. cc/ @ankit-pn @monstajoe2002 @simrann20 @Rishabh42 @ph1ne4s @destrex271 @priyanshu-kun @Prajwalprakash3722 @money8203 @BabyElias |
Hello everyone I am khushi sharma, B.Tech 2nd year student at IGDTUW. I am a MERN stack web developer currently working with postman api . I have done web3 projects also. |
Dear @vinitshahdeo I am excited to announce that I have completed my training as a Full Stack Developer and I am eager to contribute my skills to open projects. I am particularly interested in the project proposal to develop an open-source approach to finding Swagger and OpenAPI definitions on the open web. As a Full Stack Developer, I have experience with web development, crawling web pages, and following URLs. I am confident that I have the skills required to create a simple open-source API that will help developers find APIs in a sea of web pages. I am looking for a mentor @vinitshahdeo @jansche who can guide me through this project and help me learn and grow as a developer. I am committed to putting in the time and effort required to complete this project successfully and contribute to the open source community. Thank you for considering my application, and I look forward to hearing back from you soon. |
👋 Hello everyone, I wrote a public blog post about this project idea - how OAWS can help unleash the power of OpenAPI! Hope it helps—vinitshahdeo.dev/open-api-web-search |
Dear @vinitshahdeo |
@thelifeofshubh you just copied my whole introduction text. Plagiarism (at least in introducing yourself) will lead you nowhere, so just try to be authentic. |
Hey there, @ankit-pn ! I was just curious to know if mentor is active lately. Have you noticed any recent activity from them? |
yes, @MikeRalphson , @vinitshahdeo are pretty active |
hey @jansche |
Hello! |
Hello @vinitshahdeo please can you recommend me a first issue whilst I am preparing for GSOC 2024. I read the terms of contribution and it listed I get intouch with the main mentor. |
Closed as completed as part of 2023 edition. |
Summary: Develop an open-source approach to finding Swagger and OpenAPI definitions on the open web, crawling web pages looking for API definitions, validating them, and them consuming and indexing them as part of an ongoing search. Providing a simple way that developers can find APIs that exist by finding documentation, repositories, and other common aspects of running an API.
Skills: Knowledge of the web, and how to crawl web pages, follow URLs, or utilize an existing solution like Common Crawl.
Expected Outcomes: Provide a simple open-source API that abstracts away the complexity of searching the web for specific terms, helping identify APIs in a sea of web pages. Providing a simple interface that will set in motion an asynchronous searching of the web or corpus of web content looking for APIs. Allowing users to initiate a search, but then return regularly to see the results of the search over time, building up results, but then aggregating them for each pulling via simple API.
Possible mentors: @vinitshahdeo + 1-2 additional mentors
Project Repo: https://github.com/postman-open-technologies/openapi-web-search
Size of Project: 175h
Rating: Medium skills level
The text was updated successfully, but these errors were encountered: