Skip to content

GSoC 2024

Ayan Sinha Mahapatra edited this page Mar 26, 2024 · 6 revisions

AboutCode will be applying as a GSoC mentoring org for 2024! See https://summerofcode.withgoogle.com/programs/2024 for more details about the program this year. Here is the complete timeline: https://developers.google.com/open-source/gsoc/timeline

TL;DR See our list of ideas: https://github.com/nexB/aboutcode/wiki/GSOC-2024-project-ideas

Table of Contents

This page contains information for aspiring contributors interested in participating and helping with the GSoC 2024 program.

AboutCode: Scan code for origin, license and vulnerabilities

AboutCode is a family of FOSS projects to uncover data ... about software code:

  • where does the code come from? which software package?
  • what is its license? copyright?
  • is the code vulnerable, maintained, well coded?
  • what are its dependencies, are there vulneribilities/licensing issues?

All these are questions that are important to answer: there are millions of free and open source software components available on the web for reuse.

Knowing where a software package comes from, what its license is and whether it is vulnerable should be a problem of the past such that everyone can safely consume more free and open source software. We support not only open source software, but also open data, generated and curated by our applications.

Join us to make it so!

Our tools are used to help detect and report the origin and license of source code, packages and binaries as well as discover software and package dependencies, and tracking security vulnerabilities, bugs and other important software package attributes. They also support creating SBOMs and other disclosure documents with this information and supports leading standards like SPDX, CycloneDX and VEX. They are a suite of database backed web-based and API servers, command line applications and desktop applications often working together to create and provide data about software usability and health.

AboutCode projects are...

NOTE: If you are looking for the Project Ideas List instead of their parent Projects, see https://github.com/nexB/aboutcode/wiki/GSOC-2024-project-ideas

Aboutcode project repositories which are the main focus of GSoC 2024 are:

  • purlDB consists of tools to create and expose a database of purls (Package URLs) and also has package data for all of these packages created from scans.

  • VulnerableCode is a web-based API and database to collect and track all the known software package vulnerabilities, with affected and fixed packages, references and a standalone tool Vulntotal to compare this vulneribility information across similar tools.

  • Scancode.io is a web-based and API to run and review scans in rich scripted ScanPipe pipelines, on different kinds of containers/docker images/package archives/source packages/manifests etc, to get information on source/licenses/vulneribilities information.

  • ScanCode Toolkit is a popular command line tool to scan code for licenses, copyrights and packages, used by many organizations and FOSS projects, small and large.

GSoC proposals for these above repositories will receive the maximum interest from aboutcode mentors.

There are many other aboutcode projects:

  • DejaCode is a complete enterprise-level web application to automate open source license compliance and ensure software supply chain integrity. This was open-sourced recently.

  • univers is a package to parse and compare all the package versions and all the ranges.

  • FetchCode is a library to reliably fetch any code via HTTP, FTP and version control systems such as git.

  • python-inspector and nuget inspector inspects manifests and code to resolve dependencies (vulnerable and non-vulnerable) for python and nuget packages respectively.

  • Scancode Workbench is a TypeScript, React based desktop application to visualize and review scan results for scancode-toolkit scans.

  • AboutCode Toolkit is a command line tool to document and inventory known packages and licenses and generate attribution docs, typically using the results of analyzed and reviewed scans.

  • TraceCode Toolkit is a command line tool to find which source code file is used to create a compiled binary by tracing and graphing a build.

  • DeltaCode is a command line tool to compare scans and determine if and where there are material differences that affect licensing.

  • license-expression is a library to parse, analyze, simplify and render boolean license expression (such as SPDX)

  • container-inspector is a command line tool to analyze the code in Docker and container images.

We have also co-founded and/or contributing to important projects for other organizations:

  • Package URL which is an emerging standard to reference software packages of all types with simple, readable and concise URLs.

  • SPDX aka. Software Package Data Exchange, a spec to document the origin and licensing of packages.

  • CycloneDX aka. OWASP CycloneDX is a full-stack Bill of Materials (BOM) standard that provides advanced supply chain capabilities for cyber risk reduction

  • ClearlyDefined to review and help FOSS projects improve their licensing and documentation clarity.

Contact

Join the chat online at our element chatrooms:

Please try asking questions the smart way: http://www.catb.org/~esr/faqs/smart-questions.html

For personal issues, you can contact the org admins directly:

  • @pombredanne and pombredanne [at] gmail [dot] com
  • @AyanSinhaMahapatra and asmahapatra [at] nexb [dot] com

Technology

Discovering the origin, license and security of code is a vast topic. We primarily use Python with some C/C++ , Rust and Go for performance sensitive code. We use Django, PostgreSQL and javascript for web apps and API servers.

Our domain includes text analysis and processing (for instance for copyrights and licenses detection), parsing (for package manifest formats), binary analysis (to detect the origin and license of binaries, primarily based on the corresponding source code), vulneribility data aggregation and processing, dependency resolution, mining and matching package data (from scannning and fetching metadata from package managers), maintaining databases of packages (using PURL), and these are realized by Web-based tools and APIs (to expose the tools and libraries as Web Services), scripting in python to automate workflows and low-level data structures for efficient matching (such as high performance string search automatons).

Skills

Incoming students will need the following skills:

  • Intermediate to strong Python programming. For some projects, familiarity with Django and Postgresql would be great

  • Familiarity with git as a version control system. Take the time to learn git!

  • Ability to set up your own development environment

  • An interest in open source security, licensing and generally software composition analysis.

We are happy to help you get up to speed, and the more you are able to demonstrate ability and skills in advance, the more likely we are to choose your application!

About your project application

Make sure you read the GSoC student guide carefully. Also follow the writing a proposal guide.

We expect your application to be in the range of 1000 words. Anything less than that will probably not contain enough information for us to determine whether you are the right person for the job. Your proposal should contain at least the following information, plus anything you think is relevant:

Personal Information

We need this information to communicate with you during the project duration and for other communication related to GSoC.

  • Your name

  • Country/Timezone you are from (just for scheduling purpose)

  • Email and Gitter/Element username

  • Link to your GitHub profile

  • Mention the details of your academic studies, any previous work, internships

  • Relevant skills that will help you to achieve the goal (programming languages, frameworks)?

  • Do you plan to have any other commitments during GSoC that may affect your work? Any vacations/holidays? Will you be available full time to work on your project? (Hint: do not bother applying if this is not a serious main time commitment during the GSoC time frame) We also have weekly status meetings, same time as the community call, on Mondays, would you be able to attend them?

  • We will be following the 12 week standard coding period as default for all our, projects, unless unforeseen circumstances arise. Do you accept the standard coding period as default?

Proposal Details

  • Title of your proposal

  • Abstract of your proposal

  • Project Sizes:

    • small (hour)
    • medium (175 hour)
    • large (350 hour) this should match what we have listed on our project idea page (and re-confirmed by the mentors on your proposal). AboutCode will only have medium or large projects so aplly accordingly.
  • Link to the original project idea on the project ideas page (if applicable)

  • Detailed description of your idea including explanation on why is it innovative and what it will contribute to the project

    • Explain your data structures and you planned main processing flows in details.
  • Mention the key deliverables of the project

  • Description of previous work on the same issue, existing solutions (links to prototypes, bibliography are more than welcome)

  • A complete timeline of your project, where the project is broken down into smaller tasks with their own deliverables/goals, by time. Please keep some buffer time at the end and consider that it will take some time to address feedback, write docs and other related work. We will help with this on your proposal.

Note that you have to submit a PDF of your proposal in the GSoC website and you can keep updating with a new proposal PDF until the deadline at April 4th 18:00 UTC.

Your contributions

The best way to demonstrate your capability would be to submit a small patch ahead of the project selection for an existing issue or a new issue.
We will always consider and prefer a project submissions where you have submitted a patch over any other submission without a patch.

Note that only useful code contributions demonstrate your ability to successfully complete the project you are proposing, and insignificant/documentation contributions will not support your proposal as much as quality contributions will. Also try to contribute to issues similar to your project idea, for more impact.

  • Any previous open-source projects (or even previous GSoC) you have contributed to and links.

  • Detailed list of your code contributions to aboutcode, by project, with links and brief description.

  • You can also list documentation/other contributions, issues opened etc. optionally.

For example if you are looking to contribute in SCTK/SCIO/purldb it would be nice to try adding support for a new package format in SCTK: https://github.com/nexB/scancode-toolkit/issues?q=is%3Aopen+is%3Aissue+label%3Apackage-formats If you are looking to contribute to vulnerablecode, data collection issues are nice to demonstrate that you understand all the related concepts and workflows: https://github.com/nexB/vulnerablecode/issues?q=is%3Aopen+is%3Aissue+label%3A%22Data+collection%22

Note that even if you have contributed to other open-source projects, it is recommended you submit a patch (code contribution, not docs/other minor contributions) to the repo you are proposing a GSoC project for, as this makes you familiar with the repo, and gives us more confidence about your ability to finish the project successfully.

Take feedback on proposal

  • You should share your proposal early to take feedback from mentors.

    • Don't be afraid to share your proposal even if it's a draft, keep updating after you share.
    • Share you proposal in a publicly viewable google doc. It should also have comment access enabled so mentors can provide feedback.
    • Do share the proposal publicly rather than privately to mentors. Respect the spirit of open source! Don't be afraid of plagiarism as we check for it.
  • Discuss the proposal on open issues/public chat/weekly community calls for more feedback and discussion. Act upon feedback already received and keep improving it.

  • Don't wait till the last day/moment to submit your proposal, submit early! The proposal is editable so you can always update later. Announce on the public channel after submitting your proposal on the GSoC website.

Selection criteria

While creating your proposal, think about how we select proposals from all the submissions we get, to make your proposal better. Think whether your proposal is readable by a person who is not part of aboutcode, and whether they would still understand the problem, solution and steps. Also ask yourself whether the mentors will be satisfied with the level of detail and clarity in your proposal, do you understand the main bottlenecks/challenges? Where do you expect help? Do you think your timeline is reasonable? Do you understand the deliverables correctly and have intermediate goals/deliverables by timeline?

Also consider the main factors we look at when judging proposals, we mentors want successful and impactful GSoC projects. There are a couple of sub-factors towards predicting success:

  • Your contributions: We need to know whether you are capable of finishing the project successfully without significant hand holding. If you have multiple impactful and accepted code contributions, we know you are comfortable reading and writing code, understand the git/github and review workflow, and can solve problems yourself with a little bit of help. If your contributions are in the same project or area of your proposal, it also demonstrates your familiarity with the problem space, which is an added bonus. Documentation or other minor contributions are welcome, but do not demonstrate your ability or help your proposal get selected.

  • Proposal clarity and detail: This is discussed in details above in the Proposal Details section. We need to know you really understand the problem and the solution you are suggesting, and that your tasks and timeline is reasonable.

  • Communication and Feedback acceptance: Open source and GSoC is collaborative in nature. As beginners, you are not expected to know everything, and so taking feedback from more experienced community members and mentors is key to your success. This should happen at all the steps, on individual issues/PR review, proposal review and also throughout GSoC. So we need to know you can take constructive criticism and keep integrating feedback in your work.

Make sure you follow these guidelines to make your proposal stand out, for more chances of getting selected. If you are already doing everything mentioned above, you already have a very good chance of being accepted. In the rare case there are two proposals on the same project, we can only select one, and mentors have to take the hard choice of selecting one, based on these factors.

Clone this wiki locally