Skip to content

Testing, Code Guru and Quality Assurance

praveen97uma edited this page Apr 8, 2011 · 13 revisions

Abstract:

The aim of the project is to add an extensive range of unit and functional tests for the logic and views of the SoC framework. Lack of sufficient number of tests for existing code is hindering the rapid development of the framework and delaying the adoption of a Test Driven Development by melange. We aim to write a suite of tests which will enable developers to test their modified code on their machine itself prior to submitting it for review. This will also ensure that refactoring an existing code continues to work by passing the suite of tests. The project will be a main step towards our aspiration of switching to TDD methodology.

Basics

*** Name :** Praveen Kumar

*** Email :** praveen97uma@gmail.com

*** IRC:** praveen97uma on freenode

*** Availability on IRC *** 10:00-23:00 UTC

*** Education:** I am in 3rd year of 5 year Integrated M.Sc in Applied Mathematics course from Indian Institute of Technology Roorkee.

*** Country :** India

*** Contact :** +919997345244

**Self Introduction **:

I am completely into using Open Source softwares for an year. Programming, hanging out with friends, web browsing and watching movies are some of my hobbies. I love cooking and try new recipes whenever I go home. I like adventure sports and have done river rafting, kayaking and trekking.

Why do I want to participate in Google Summer of Code?

About 7 years ago, I was in 7 th standard and in that period of time, the use of computers were not so prevalent due to their high cost and low availability. Also, my father was in Army, so we had to move from one place to the other. There was no exposure to computers and I just knew that something called 'Internet' exists. Due to this reason, my father managed to buy me a PC but still we could not afford an Internet connection. My PC came with a copy of Windows (pirated) and I used to play with it trying new softwares which came with the Cd's in magazines like Digit etc. I was so fed up that I could not find a single software which I could use without paying. I used pirated and cracked softwares for my work but there was always a feeling in the mind that I am using these softwares illegally and if they catch me, they will take me to court ( I was too young!!).

It wasn't until my admission to this university, that I was exposed to the outer world and got enough resources to expand my knowledge related to computing. I got to know about Ubuntu from my seniors and they said you don't have to buy it, its a 'Free' OS unlike Windows and more powerful. My father bought me a laptop but I asked him specifically to buy one with no OS pre-installed. That saved some money for him, I installed Ubuntu, faced problems switching to it from Windows initially, but now I am a full time user of Linux OSs. The best part of it is that there are Open Source Softwares for each and every need. I feel the 'Freedom' and no more have to worry about copying and distributing it and also, no fear of being dragged to court. I have been using Open Source Softwares since last 2 years and it has always been my dream to be a part of an open source project.

I heard about GSoC last summer from my senior who had been selected in GSoC '10 and he told me that the program provides a good platform to be a part of an Open Source project, you get a lot of knowledge , you collaborate with people around the world and convergence of ideas to make free softwares better and the world a happy place. I feel the power and freedom of using OSS. So, I want to get associated with an Open Source project through GSoC and this will help in development of my Open Source skills. I have been following the GSoC mailing list for a long time, watching videos posted by GSoC students on YouTube, reading blogs and improving my coding skills.

My Open Source Experience

  1. My first contribution to Open Source was the reporting of a bug [1] in Sage mathematical library during SAGE DAYS 25, an international conference held at IIT Bombay, India last year. Sage has been written in python and is an open source equivalent of Mathematica and Matlab. The conference was organized by FOSSEE [2] ( Free and Open Source Softwares in Education and Engineering) based at IIT Bombay, India.

  2. I was one of the contributors of parametric plot function to matplotlib library during the code sprint sessions organized as part of the Scipy 2010 conference held at Hyderabad, India. The conference was organized by FOSSEE, India. I also submitted some documentation diffs to John, the creator of Matplotlib who was one of the speakers at the conference.

  3. Submitted two patches to Melange. One was a test for app.soc.logic.test_validate.py[3] and other fixed mail_dispatcher.py [4].

Are you already involved with any open source development projects? If yes, please describe the project and the scope of your involvement.

 An initiative 'Code Arx' similar to GSoC was taken in my University to promote and create Open Source softwares. Several projects were proposed by some mentors who were part of Information Management Group of the University. I submitted a proposal to write a small scale search engine in Python and was accepted. So, I am currently working on this project and including the mentor, we are a team of three students. I have coded an initial design of the crawler and working on the indexer part. I have used urllib2 module for page retrieval and BeautifulSoup for parsing. I will switch to the Twisted framework as this project matures. Hope to see it becoming a large project. The screenshots of the project page are at [5][6][7]. My code is at [8].

Communication skills

Though most of our developers are not native English speakers, English is the project's working language. Describe your fluency level in written English.

I am fluent in written as well as spoken English as the medium of instruction in schools and colleges I studied was English. I have also communicated to the community members in the mailing lists and on IRC.

What spoken languages are you fluent in?

I am fluent in English and Hindi (my mother tongue).

Do you give constructive advice?

Yes, I love to help people who are new to some task and if they show their interest in working hard to complete it, I shall definitely help them. For example, one of my junior student who showed interest in working with me on the project to design a small search engine, even though he did not know any concepts of crawling, indexing etc., I agreed to pair with him. I taught him the concepts that I knew and I told him that he will now have to explore the concepts himself and discuss with me any problems faced and we will together sort the problems out. Its better to have more brains working on a project to generate more ideas and the efficiency increases.

Do you receive advice well?

Its not that we know everything and that we can handle every difficult situation ourselves. We need advice of people who are more experienced than us. Googling and Books will help only to a certain extent while writing a piece code that one needs help with. So, yes I have asked for help may a times and people help a lot. I was interested in Boost.Python project but the skills requirement were high, so I asked openly and honestly the mentor whether should I apply for it because its of no use if the summer experience does not turn out to be fruitful for both of us. He was impressed and advised me to get associated with other projects according to my present skills and may be apply next year. He was ready to help me to learn the needed skills even out of the bounds of GSoC.

Are you good at sorting useful criticisms from useless ones?

Well, I feel criticisms are good whether they are in your favor or not, useful or not. Everyone has different perspectives and approach to look at things. It may be wrong for them but right for me. So, the right thing to do is to extract something good out of the positive as well as the negative criticisms. If someone says negatively about something, I should think why he/she is saying so. If I think he is correct up to some extent, I will try to adapt to it.

How autonomous are you when developing? Would you rather discuss intensively changes and not start coding until you know what you want to do or would you rather code a proof of concept to "see how it turn out", taking the risk of having it thrown away if it doesn't match what the project want?

Well, I use both the approaches as the need may be. But I prefer to discuss what has to be coded because time is precious and its not worth to spend time on a piece of code that one has to ultimately throw away. The second approach is however sometimes necessary to see how things work and it may generate new ideas or the code may be used elsewhere. While developing, I make full use of resources like Internet and Books to learn new and better approaches to code a concept. But sometimes, it becomes necessary to ask a more experienced person when I get stuck.

Project Details :

Proposal: Melange has been under a rapid development process and a lot of introduction of new features and refactoring has been taking place. I plotted a graph of the errors and failures that resulted when I ran the present tests over a period of two weeks. Leo has currently temporarily disabled the functional and unit tests but I edited the tests/run.py module and re-enabled them so that I could take the readings. The significant number of failures and errors are due to the major changes which the melange developers are pouring in everyday and the graph also does lays emphasis on our need of an extensive test suite and an automatic build and continuous integration system so that the developers could be notified about the breaks in the code due to the their proposed changes. The sudden jump in errors in the graph given below was due to raise of a "BadValueError : Property public_name is required " in tests.app.soc.modules.gsoc.tasks.test_accept_proposals.py, tests.app.soc.views.models.test_org_admin.py and various other modules. The drop in the failures were due to the fact that the tests which earlier failed now produced errors. Its sort of complementary.

errors

The data for the above plot is available at [9]. I have maintained the log with the latest changeset when I ran the tests.

Following is a rough idea of the test code vs production ratio which I calculated by counting the number of lines of code. I used 'find . -type f -exec wc -l "{}" ; | awk ' { sum += $1 } END {print sum}' to count the lines.

codeVsProdAnalysis

I propose to write the tests for the remaining modules for which tests yet have to be written and are of utmost importance . A lot of hard work has already been done by Leo and Sverre during the last year GSoC to lay a sound testing framework. The classes in test_utils.py module such as DjangoTestCase, MailTestCase provide extensions to Django TestCase, GaeTestBed and others so that new tests can be written easily for the SoC framework. As I get more familiar with the codebase, I can get on to speed writing the tests. I will attempt the project in following three phases. I wish to document the progress in each phase in my blog. The phases are:-

Phase I

This phase will consist of writing unit tests for the logic and views of gsoc module as suggested by Sverre on irc that it should be the main focus. I have listed the modules for which tests yet are to be written in the figure above. For this test_utils.DjangoTestCase , test_utils.MailTestCase and other classes can be used. HTTP requests on a URL can be simulated using the django.test.client module which has already been integrated into test_utils.DjangoTestCase. Further extensions to test_utills can be done as the need arises.

Phase II

Once sufficient number of unit tests are implemented, we can implement the functional tests. The present functional tests are reported to be broken. I will study the present functional tests to get familiar with them. Functional testing can be done using Selenium or Twill as suggested on Melange Wiki. Functional testing should be a priority because the views broke on some browsers as was being discussed in the mailing lists.

The tests written in Phase I and II will hopefully form part of the Smoke Tests.

Phase III

If time permits, we can fix the Pylint errors to maintain a healthy codebase. We can also attempt to design an automatic build and continuous integration system using BuildBot. BuildBot has a BuildMaster which distributes the load to BuildSlaves to efficiently handle the workload. It can also be programmed to test the development trunk automatically and periodically as well as when a developer submits some changes.

Project Plan:

  • My End-Semester exams will end on 13 May and until the end of Community Bonding Period i.e 23rd May, I will get ample time to augment my skills and increase familiarity with the framework. I will study the present tests more thoroughly. I will adopt the test-based learning as mentioned in the project idea because I also found it more productive while I was attempting to write tests as a Pre-GSoC exercise. I have a copy of the book "Programming Google App Engine" by Dan Anderson as suggested by Leo in his post on Melange's blog that I will read properly to increase my knowledge about Google App Engine and also 'Django 1.1 Testing and Debugging' by M. Tracey to increase my testing skills.

  • I will start a blog to document my efforts and experience while dealing with the codebase. I found Leo's posts on Melange's blog quite helpful to get familiarity and the approach to learn the codebase. I got to know about the tools like 'tree' and 'Pylint.pyreverse' which Leo used to generate the directory structure and class and package diagrams. The learning curve was lowered a lot especially when I was new to this framework.

  • I will be in constant touch with the mentor and the community and as I finish a test, I will put it for review so that it can be integrated into the codebase as soon as possible.

  • I have no commitment other than GSoC during the summers, so I will be able to give 10-12 hours per day and 6 days a week. College will reopen on 16th July which is during the mid-term evaluations and after that I will be able to devote 30-35 hours only per week.

Rough Timeline

**May 15 - May 23 (Community Bonding Period **)

  • Setup a new development environment

  • Get to know the community

  • Start a blog

  • Increase familiarity with the codebase and other tools

  • Read the books 'Django 1.1 Testing and Debugging' and 'Programming Google App Engine'

May 24- June 27 ( 5 Weeks )

  • Implement Phase I
  • Write unit tests for app.soc.modules.gsoc
  • fix pylint errors for the new code and update my blog

June 28 - July 26 ( 4 Weeks )

  • Implement Phase II
  • Implement functional tests and do in-browser testing.
  • July 15 and 16 will go in traveling from home to college as College reopens on July 17

July 27- Aug 22 ( 3.5 Weeks)

  • Implement Phase III
  • Code cleaning and documentation

Why did you choose this project?

I chose this project because the project best suited to my present skills and capabilities. Due to my experience in web development and proficiency in python, I get an intuitive idea of what a particular module has been written for and what its possible role may be in the SoC framework or how it may be used. So, this idea or intuitive feeling how a particular module should work is essential in writing tests so that I can find the counter examples where the tests should fail. To explain my point better, there is a project of Boost in which the support to numpy.ndarray has to be extended in Boost.Python libraries. So, presently they just import a python module and wrap it in C++ to use it in Boost.Python. Hence, it provides limited support. The solution is to use NumPy C-API to extend the functionality of NumPy arrays and integrate it into Boost.Python. Now, I have worked with numpy.ndarrays but only through Python not through C API. I am good at C++ but have never used Boost.Python. So, even though I know C++ and python, I find it difficult to attempt to extend its functionality because I do not have an intuitive feeling of how a particular code should work or I am not familiar how internals interact with each other.

In case of this project, for example, SoC framework defines a user, its role and various other attributes. So, a user such as a mentor or a student is allowed to do only certain type of tasks and each have different access rights. A student can not see the proposal of other students if it is not public. Therefore, because I have been exposed to web development before and I am familiar with such ideas, I can give my 100% to it.

** What do you expect to gain from this project?**

This project will enhance my development skills in Python as well as the knowledge of new technologies like Google App Engine. I will get to know about the SoC framework and I will gain the confidence of starting such a large project myself in future. I also want to take Open Source development as my career, so this project will be a stepping stone for it.

What would make you stay in the Melange community after the conclusion of GSoC?

 The two weeks that I have spent reading conversations and discussions on the mailing list and IRC, to me it is not a community but a family and I aspire to be a part of the Melange family. It is interesting to see how SRabbelier assigns specific tasks to each member of the SoC team and everyone willingly sets out to complete it. GSoC, for me is an opportunity to get associated with an open source project and learning never ends, so even after the conclusion of GSoC, there will be lot to learn and experience. The tests have to be maintained as future changes and revisions are made to the codebase. And of course it feels great to see your code being helpful to open source community and I want to see myself as a mentor for next year GSoC. Melange community has been the most helpful of all the open source projects I have followed.

Practical considerations Are you familiar with any of the following tools or languages? Mercurial (used for all commits) HTML/CSS/Javascript (used in the frontend) Python (language used in the backend) AppEngine (platform used for the backend) Django (framework used in the backend)

I have used GitHub and all my repositories are public on my GitHub account. I came to know about Mercurial through my senior who worked for Mercurial as a GSoC student last year. Also, during my pre-GSoC exercise, I learned about creating patches, committing and updating the local repository in Mercurial by reading resources suggested by madrazr, SRabbelier and the community on IRC.

I have significant knowledge of HTML and CSS and have created websites for events during our technical festival. I have not written much code in JavaScript, but only for functionality which was required during web-designing by following online tutorials. Two of my web projects in PHP with a MySQl back-end are at [10]

Python was introduced to me by FOSSEE ( Free and Open Source Softwares in Science and Engineering ), Mumbai, India last year during the SAGE DAYS 25 conference held at IIT Bombay, India. I was surprised by its power and the ease of its use. Since, then I have been actively using Python for solving problems and puzzles on www.projecteuler.net and in academic work. I am also using python for my project on designing a small scale search engine as I have mentioned it in the experience section. I am comfortable in using Python.

I read about Google App Engine in an article on the blog www.techcrunch.com . It mentioned about its release in 2008 and that it supported only python then. I have no prior development experience with Google App Engine but I have gained familiarity with it while writing the test for app.soc.logic.validate.py and app.soc.logic.tags.py as a pre-GSoC exercise. I read the online documentation as well as the book 'Programming Google App Engine' by Dan Sanderson as suggested by Leo in one of his posts on Melange's blog. Reading the book made clear concepts such as what the difference between a Datastore and a Database is and various other data structures such as db.Model which I encountered while studying the modules in the SoC codebase.

I am familiar with Django but do not have much development experience. I have written small projects but only in tutorials while learning Django. I am well aware with the Model, View and Controller concepts which I read while learning Ruby on Rails Web Framework.

Which tools do you normally use for development? Why do you use them?

The two main languages that I work upon presently are Python and C++. For C++, I use Code Blocks because I like its interface and I can work on multiple projects simultaneously. It also gives the execution time of my C++ programs which I need to know because I am studying a course on Data Structures and Algorithms. I did not want to use the ctime C++ libraries and the time command in the bash shell to know the execution time. For small programs, I make use of vim and gedit as a text editor to write the source code and g++ compiler to compile them.

For Python. I use the Eclipse with the Pydev plugin. Earlier I used Aptana. IDE's are good to handle large number of files in a project and I make use of it in my Code Arx project. Eclipse saves a lot of time as it highlights indentation errors, incorrect keywords and syntax errors which can then be easily corrected. Eclipse also offers suggestions for code completion which I like a lot. Also, I do not have to open a terminal to run a python program. So, Eclipse is my favorite IDE for Python. Gedit and vim on the other hand, do not possess such features so I use them to write small python programs and test them in the terminal.

'touch', 'more', 'less', 'grep' etc. are some of the common shell commands that I use. I used 'grep' to search where a particular function or class was used in the codebase. I grepped for 'GSoCOrganisation' string in the complete soc directory to know where and how it was used as I was writing a test for soc.app.logic.tags.py. I finished test code for 'tags.TagsService.prepareTagsForStoring' and 'tags.TagsService.setTagValuesForEntity' and when writing the code for 'removeTagsForEntity' , there was the need of an entity which could be tagged. I asked madrazr on IRC for taggable entities in the framework and he suggested 'GSoCOrganisation' and 'GCIOrganisation' entities. I chose GSoCOrganisation and grepped the string and locate a file where it was used so that I could study how to use and initialize it in my test code. 'grep' is a great tool to determine where a function is used in the codebase and we can also determine the frequency of its use, and so can assess an idea about its importance.

Would you mind talking with your mentor on telephone / Internet phone?

I would love to talk to my mentor over the phone and I think it is one of the best ways to invest Google's money into the project. Other ways I would invest the money on would be to buy good books for reference and a good Internet connection.

I will be happy to provide any other information the organization needs.

External Links:

[1] http://trac.sagemath.org/sage_trac/ticket/9710

[2] http://fossee.in

My Patches submitted to Melange

[3] http://code.google.com/p/soc/source/detail?r=7210374952fbf60b2836c99c1d2ff70e1ae7f427

[4] http://code.google.com/p/soc/source/detail?r=8843114c35b5b5e325a7d656d9f7cd891253a002

Code Arx project page screenshots

[5] http://img7.imageshack.us/img7/1231/codearxscreenshot.png/

[6] http://img856.imageshack.us/img856/7019/codearxprojectpage.png/

[7] http://img822.imageshack.us/img822/7024/codearxhomepage.png/

My Web Spider Project

[8] https://github.com/praveen97uma/Web-Crawler

[9] https://github.com/praveen97uma/GSoC-Docs/blob/master/errosLogMelange.txt/

My Web Projects

[10] https://github.com/praveen97uma/Web-Projects