Skip to content

GSoC 2023 Syed Md Mihan Chistie

Nabil Freij edited this page Feb 22, 2024 · 4 revisions

Rewriting the SunPy Web Scraper for More Efficient Data Retrieval

Contributor Information

I have also worked on mobile development projects using Flutter and Dart, and had experience in Android Development using Android Studio and also have experience in backend development, particularly in working with Node.js, Express.js, REST APIs, AWS Lambda, and Firebase. I am also proficient in various programming languages such as Java, Python, and C.

I am excited to submit my proposal for the "Scraper Rewrite" project as part of the Google Summer of Code 2023 program. Working on this project would also provide an excellent opportunity for me to improve my skills in Python programming, web scraping, and data analysis, as well as to gain experience in the field of astronomy and astrophysics.

Problem Statement

The current Scraper used by Sunpy to gather information is outdated and inefficient. The Scraper is not maintainable and has become prone to errors.The Scraper class seems to be working only for specific parameters provided and won't work for other different parameters like when providing a different timeRange parameter. It is very confusing to use regex. The code has a lot of bugs which need to be fixed and optimized.

How do you plan to implement the project?

I will first analyze the current scraper code and identify areas for improvement. I will then develop a new scraper using Python and modern libraries, such that it is more efficient. The new scraper should be able to extract relevant information from the website(For e.g.: proba2.oma.be). Also use parse instead of regex to make it less confusing for clients to search. I will also develop tests to ensure the scraper is working correctly. Additionally, the scraper should be able to handle errors and edge cases gracefully, such as dealing with inconsistent formatting and handling timeouts. The new scraper should be well-documented and easily configurable so that it can be used by other members of the Sunpy community to gather information. The new scraper will require changes to the SunPy codebase, specifically to the code using the current scraper.

What are your contributions to the SunPy Project so far?

What do you want to achieve? What excites you about this project? Why did you choose it?

I want to rewrite a new Sunpy Scraper module to improve its efficiency and reliability. I am excited about this project because it will give me the opportunity to work on a large-scale, real-world software project and contribute to the scientific community. I chose this project because I am interested in the field of solar physics and I believe that this project will provide me with valuable experience in software development and data analysis.

Why are you suited to work on this project?

With my proficiency in Python, web scraping, and software development, I possess the necessary skills for this project. Additionally, my familiarity with the SunPy project will enable me to efficiently initiate this project.

What have other people done on this idea?

There have been previous attempts to rewrite the Sunpy Scraper module, but none of them have been merged into the main project due to issues with compatibility and reliability. There are also several open issues related to the current Scraper module, which will need to be addressed in this project. Some of the issues and PRs are:-

Timeline

Community Bonding Period (May 4 - May 28)

  • Get to know the mentor and other members of the OpenAstronomy community
  • Get familiarized with the existing codebase and project documentation and tools used in the project
  • Discuss the project requirements and goals with the mentor and the API Design

Week 1 - 2 (May 29 - June 12)

  • Experimenting with the current Scrapper and reviewing the current Class.
  • Implementing a basic version of the scraper using BeautifulSoup 4, requests.

Week 3 - 4 (June 13 - June 26)

  • Figured out if we can use parse instead of python regex.
  • Functional scraper written

Week 5 - 6 (June 27 - July 10)

  • Finished scraper implementation.
  • Testing and debugging.
  • First Evaluation (July 11, 2023)
  • Partial skeleton of scraper written

Week 7 - 8 (July 14 – July 24)

  • Refactoring code as necessary.
  • Adding any missing functionality.
  • Implement performance improvements where possible.

Week 9 - 10 (July 25 - August 21)

  • Continue to work on performance improvements.
  • Writing documentation and preparing for final evaluation.
  • Finalizing any last-minute changes.
  • Submit code for review and feedback from mentors.

Final Week (August 21 - August 28)

  • Functional replacement ready for review and merging into Sunpy.
  • Make final code improvements based on feedback from mentors.
  • Submit final code and documentation for review and evaluation.

Have you participated previously in GSoC? When? With which project?

No

Are you also applying to other projects?

No

Are you eligible to receive payments from Google?

Yes

How much time do you plan to invest in the project before, during, and after the Summer of Code?

I have exams during the community bonding period and Week 6 of the GSoC project and may not be available full-time during those days. However, I will communicate any updates or changes in my availability. After the exams, I will have a lot of time available to work on the project and can prioritize it. I understand the importance of meeting project deadlines and will communicate any unforeseen circumstances to the mentors.

What is your experience with programming?

I have experience with programming in various languages such as Python, C and Java.

What is your experience with Python?

I have used Python for various projects, and I have good experience with NumPy and Matplotlib. If you have Python projects or good Python code example, you should link it here. It will be good for us to see the level of Python experience you have. I have done a lot of Python projects which are based on Scraping using Beautiful Soup, XPath, Scrapy and Selenium. Unfortunately the Programs are stored in my old laptop and I forgot to upload the projects to my Github.

What is your experience with open source software?

I have contributed to a few open-source projects, including SunPy. My experience with open source software is limited, but I am excited to learn more about contributing to the SunPy Project.

Have you ever used git or another version control system?

I have worked on some personal projects where I have used Git for version control.

References

Proposal Design Format: GSoC 2023 Draft proposal by Saksham-13

Clone this wiki locally