GSoC 2023 Syed Md Mihan Chistie
- Name: Syed Md Mihan Chistie
- University: Dayananda Sagar University, Bangalore
- Email: tubbachistie@gmail.com
- Matrix: @code2go:matrix.org
- Github: Mihan786Chistie
- PR link(s): fixed mapbase autoalign #6796 by Mihan786Chistie was merged on Feb 27 adds 'imshow' as an option for autoalign #6904 by Mihan786Chistie I am a second-year computer science student with experience in web scraping, Python programming, and data visualization. I have a strong background in software development, particularly in web and mobile development, as well as experience in voice assistant development. As a Freelancer, I have experience working with Python libraries such as Scrapy and Beautiful Soup for web scraping, and I have also used data visualization tools such as Matplotlib for data analysis and exploration and am currently learning Seaborn.
I have also worked on mobile development projects using Flutter and Dart, and had experience in Android Development using Android Studio and also have experience in backend development, particularly in working with Node.js, Express.js, REST APIs, AWS Lambda, and Firebase. I am also proficient in various programming languages such as Java, Python, and C.
I am excited to submit my proposal for the "Scraper Rewrite" project as part of the Google Summer of Code 2023 program. Working on this project would also provide an excellent opportunity for me to improve my skills in Python programming, web scraping, and data analysis, as well as to gain experience in the field of astronomy and astrophysics.
The current Scraper used by Sunpy to gather information is outdated and inefficient. The Scraper is not maintainable and has become prone to errors.The Scraper class seems to be working only for specific parameters provided and won't work for other different parameters like when providing a different timeRange parameter. It is very confusing to use regex. The code has a lot of bugs which need to be fixed and optimized.
I will first analyze the current scraper code and identify areas for improvement. I will then develop a new scraper using Python and modern libraries, such that it is more efficient. The new scraper should be able to extract relevant information from the website(For e.g.: proba2.oma.be). Also use parse instead of regex to make it less confusing for clients to search. I will also develop tests to ensure the scraper is working correctly. Additionally, the scraper should be able to handle errors and edge cases gracefully, such as dealing with inconsistent formatting and handling timeouts. The new scraper should be well-documented and easily configurable so that it can be used by other members of the Sunpy community to gather information. The new scraper will require changes to the SunPy codebase, specifically to the code using the current scraper.
- PR: 6796: Fixed the issue Autoalign plotting of maps is off by half a pixel #6794 with the help of mentors Albert Y. Shih and Nabil Freij. It fixed the bug in the pcolormesh() function call to supply the pixel edges explicitly preventing the result from being misaligned by half a pixel.
- PR: 6904: Fixing the issue Add "imshow" as an option for autoalign plotting of maps #6812. It added the imshow() function as an option for autoalign parameter when plotting a map.
I want to rewrite a new Sunpy Scraper module to improve its efficiency and reliability. I am excited about this project because it will give me the opportunity to work on a large-scale, real-world software project and contribute to the scientific community. I chose this project because I am interested in the field of solar physics and I believe that this project will provide me with valuable experience in software development and data analysis.
With my proficiency in Python, web scraping, and software development, I possess the necessary skills for this project. Additionally, my familiarity with the SunPy project will enable me to efficiently initiate this project.
There have been previous attempts to rewrite the Sunpy Scraper module, but none of them have been merged into the main project due to issues with compatibility and reliability. There are also several open issues related to the current Scraper module, which will need to be addressed in this project. Some of the issues and PRs are:-
- https://github.com/sunpy/sunpy/issues/4888(It was regarding redesigning the scraper class to make it efficient and get rid of all the bugs)
- https://github.com/sunpy/sunpy/pull/6438(here the scraper code is moved from sunpy.util to sunpy.net)
- https://github.com/sunpy/sunpy/issues/5217(the TimeRange() function acts weirdly when the arguments are slightly changed as it is hard to optimize the code for every time interval parameter)
- https://github.com/sunpy/sunpy/issues/4493(The Scraper works fine when provided with simple pattern url but crashes when a complex pattern url is provided)
- https://github.com/sunpy/sunpy/issues/4336(This issue mentioned refactoring the Scraper regex and use parse instead)
- Get to know the mentor and other members of the OpenAstronomy community
- Get familiarized with the existing codebase and project documentation and tools used in the project
- Discuss the project requirements and goals with the mentor and the API Design
- Experimenting with the current Scrapper and reviewing the current Class.
- Implementing a basic version of the scraper using BeautifulSoup 4, requests.
- Figured out if we can use parse instead of python regex.
- Functional scraper written
- Finished scraper implementation.
- Testing and debugging.
- First Evaluation (July 11, 2023)
- Partial skeleton of scraper written
- Refactoring code as necessary.
- Adding any missing functionality.
- Implement performance improvements where possible.
- Continue to work on performance improvements.
- Writing documentation and preparing for final evaluation.
- Finalizing any last-minute changes.
- Submit code for review and feedback from mentors.
- Functional replacement ready for review and merging into Sunpy.
- Make final code improvements based on feedback from mentors.
- Submit final code and documentation for review and evaluation.
No
No
Yes
I have exams during the community bonding period and Week 6 of the GSoC project and may not be available full-time during those days. However, I will communicate any updates or changes in my availability. After the exams, I will have a lot of time available to work on the project and can prioritize it. I understand the importance of meeting project deadlines and will communicate any unforeseen circumstances to the mentors.
I have experience with programming in various languages such as Python, C and Java.
I have used Python for various projects, and I have good experience with NumPy and Matplotlib. If you have Python projects or good Python code example, you should link it here. It will be good for us to see the level of Python experience you have. I have done a lot of Python projects which are based on Scraping using Beautiful Soup, XPath, Scrapy and Selenium. Unfortunately the Programs are stored in my old laptop and I forgot to upload the projects to my Github.
I have contributed to a few open-source projects, including SunPy. My experience with open source software is limited, but I am excited to learn more about contributing to the SunPy Project.
I have worked on some personal projects where I have used Git for version control.
Proposal Design Format: GSoC 2023 Draft proposal by Saksham-13