Open Source Presence Infographic of Indian Startups
- Source Code supplement for my post : Open Source Presence Infographic of Indian Startups
- How to use it
- Terms and Conditions
Technically speaking, organizations used in this report are no more only a startup now, but I hope you people won't mind this and aren't gonna launch a drone on me.
I think, something is clear from the name itself, is it? Well! it should.
This report tries to plot all the involved organizations on the Open-Source portal. It tries to tell that, in the race to achieve their goals, what different organizations are doing there, for/in the community.
It's pretty biased though, because this report uses only one platform of the Open-Source community, GitHub.
I think it was almost mid of the December last year when I saw the interview of Flipkart's CTO Amod Malviya in a YourStory article. I started reading that and kept reading till the end. At the end my reaction was, wow! this man is awesome and he is indeed. I have seen many of his talks after reading that interview.
That interview made a different impression on me. I liked his words where he was talking about building a top class internet infrastructure in India. I don't know what you people think of Flipkart, Myntra etc. but what I think is that they are evolving continuously, at least in the technical aspect. That's why they are in the marathon and Amazon itself is in the race with them.
So, after a while I found myself on the GitHub organization of Flipkart and I was scrolling through their projects there. Then the idea of this report popped-up in my mind and here I'm, struggling with it.
##For What Joy? Is there a need? The earth will keep rotating without this report but it's kinda necessary for technical organizations to be a part of current Open-Source era. I mean as they say in the Group Dynamics, If you're part of a group then you learn for other members and they learn from you.
Do you remember something named Facebook? Lets take an example from them.
Maybe that you take PHP as a language for the kids but keep in mind that The Social Network was initially developed in that same PHP. But as they started growing and feeling glitch using it; seeing that the was not coming to help them, they attempted building something on their own. Finally today, we know the inventions as HHVM and Hack language.
So, the thing is don't wait for santa and build cool things that matters. Big organizations are already doing it, be it hhvm, react by Facebook or typeahead.js by Twitter or web-starter-kit by Google and many more by others.
I do believe that the organization selection part was a bit biased as I wanted to have my favorite organizations first on the list, like HackerEarth, Hasgeek, Housing, Flipkart, Wingify and Zomato etc.
It was disappointing to see that Housing was not on the GitHub by that time and Zomato's organization was having zero public activities.
Finally, I selected 15 startups, giving priority to my favorite ones.
- Cucumbertown - Follow great cooks, showcase your cooking, build a following
- Exotel - Reliable Cloud Telephony System for your business
- Flipkart - Online Shopping India
- Freshdesk - Online customer support software and helpdesk solution
- HackerEarth - Programming challenges and Developer jobs
- HasGeek - HasGeek organises events for geeks
- Instamojo - Easiest Way to Collect Payments Online
- Myntra - Online Shopping India
- MySmartPrice - Compare the best prices from online retailers
- Practo - Find Best Doctors and Book Appointments Online
- ShepHertz - Complete Cloud Ecosystem for App/Game Developers
- Urban Ladder - Furniture Online Shopping Store
- WebEngage - On-Site Customer Engagement Suite
- Wingify - Website Optimization tools that simply work
- Zomato - Discover great places to eat around you
There is a section here in this report, which uses last year's GitHub activity of organizations, so I killed my idea of replacing Zomato by someone else as the year was gone and it was kinda tough to jump traditional API bumper and collect data.
As I said Zomato have zero public activity last year but it doesn't mean they are not good, they are doing pretty good; aquiring it all, at a rate of hurricane wind speed and serving in cities more than you've ever been in your life. Maybe they are using some other platform, a local Git hosting or something.
You better zoom-in the images or open them in a different tab.
##1. Appearance Timeline of Organizations
Do you know, when all of these organizations were found? Not sure?
This plot shows relative appearance of selected organizations both in the public world as well as in the open-source world.
Add legend text in the image.
- I didn't know that Myntra was founded a bit earlier than Flipkart, who aquired the older player recently.
- Myntra and Flipkart came in existance before the GitHub itself.
- We can see a large gap between apperance on these two portals for Flipkart, Myntra and Zomato, Myntra being the slowest one to join.
- Some organizations like Instamojo, HackerEarth and HasGeek felt the need of time and took no significant time in this.
Well! in case if you're thinking that this information is all chatter, let me present something interesting.
Go back and see the image carefully and you'll notice something different from others for Cucumbertown and HasGeek.
Yes! the GitHub organizations for these two were created before their public launching itself. Sounds interesting, right?
I can't say for Cucumbertown now but I can present a supporting theory to prove this for the HasGeek.
Do you guys remeber what was the first event that HasGeek organised? It was DocType HTML5, you silly. The event was held on October, 2010 and HasGeek was pubilcally launched in December, 2010. You can fly to their GitHub account and check that they are developing hasgeek/doctypehtml5 since then.
Maybe organising this event was the inspiration behind launching the HasGeek, I need to hear HasGeek founder Kiran's words on it, though.
##2. Repository Status As we all know, repository is an important component of GitHub's ecosystem.
###2.1. Public Repository Status
Cloud services provider ShepHertz has maximum no. of public repositories there, mainly based on their App42 service stack. Flipkart and HasGeek also have significant no. of repositories, rest are the organizations are building their store gradually.
No. of repositories on GitHub is not the right thing to measure about, though.
###2.2. Stars Distribution
As I said, having more number of repositories doesn't explicitly show your popularity. It's not an old wars between states where king with more elephants was supposed to be the winner.
This graph represents the stars distribution on all the repositories of involved organizations.
Top 10 repositories according to no. of stars
- wingify/please.js · 211 ☆
- flipkart/HostDB 190 · ☆
- myntra/MYNStickyFlowLayout · 133 ☆
- wingify/dom-comparator · 129 ☆
- hasgeek/lastuser · 113 ☆
- flipkart/phantom · 71 ☆
- hasgeek/hasjob · 70 ☆
- wingify/agentredrabbit · 49 ☆
- wingify/lua-resty-rabbitmqstomp · 45 ☆
- hackerearth/hackerearth.vim · 28 ☆
You can see Wingify, Flipkart and HasGeek are ruling the leader-board here.
###2.3. Relative Repository Attributes
GitHub provides a feature named fork, using that you can contribute to awesome projects of others like it was your own project.
This plot shows which organization have all their own source repositories and which one is having forked repositories.
During the development, I also calculated active and inactive percentage of the forked repositories. You can have a look here at how this was calculated.
We can see that HasGeek is doing fairly good here, having more share of source repositories than forked. A large portion of Flipkart and Freshdesk's repositories are inactive-forked.
##3. Development Activity
All the involved organizations have somewhat for the community; projects born as solutions of some problems, projects born in some hackathons and so on. They're gradually building things to enhance their infrastucture and market position.
###3.1. Repository Creation
- You can see that Urban Ladder, HasGeek and Exotel created their first repository almost at the same time of their GitHub organization creation.
- ShepHertz, HasGeek and Flipkart have kinda continuous repository creation events through out the timeline.
Again, if you think that it's general knowledge, then let me show you the magic.
Go back and watch the image carefully and you'll notice something weird for HackerEarth, are you?
Yes! you see there, HackerEarth's first repository was created before creation of their GitHub organization itself. How is this even possible?
Well! ladies and gentlemen, this is possible. Let me introduce a new theory in support of this.
HackerEarth's oldest repository in the time series is django-storages. It's the same repository, which is creating the confusion. But the fact is that this repository was initially forked by HackerEarth's Co-founder Vivek on his GitHub account. After the creation of a separate organization for HackerEarth, he merged that repository to the organization.
That's why this repository's creation date is before creation of their organization. Well! again, I need Vivek's approval on this.
###3.2. Commit Activity
This plot shows weekly commit activity of all the organizations. This is pretty much mixed-up though, but this was the only plot-type in my mind at the time, when I was developing this.
You can see a relatively more development activity in the start of the year.
Flipkart development team keeps a fork of the linux, it's not a forked repo though. I removed its activities because this was making the plot even more cluttered. You can check that plot also, though.
##4. Technology Stack
Different organization are working in different fields of the technology; be it medical services, developer events, online shopping, food, cloud services, online payments etc., so they're encountering different problems in the path and managing it accordingly.
###4.1. Programming Languages in Production
This plot uses colors from GitHub's linguist for different programming languages.
This helps us understanding tech-stack of all the organizations.
- Flipkart uses Java, HasGeek uses Python, Practo uses PHP and Freshdesk uses Ruby as their major programming language.
- Organizations have started using non-traditional languages like Lua, Erlang and Scala etc.
- ShepHertz uses maximum no. of programming languages(14), in their quest to serve all in-demand programming language in their service.
###4.2. Field of work
This section deals with the fields, different organization are working in.
To calculate the results, I have used repository names and their description here. Actually I wanted to have relative sharing in fields of working of all the organizations.
So, initially, my plan was to use Latent Dirichlet Allocation on the repository-description-text corpus for Topic Modeling.
Where I had use concatenated repository descriptions of organizations as a document but then I droped this idea because of asymmetrical repository distribution. It was resulting in a corpus of 14 documents only (Zomato excluded).
You can have a brief knowledge about LDA, here.
Then I changed the plan and moved towards Naive Bayes Classifier and used word frequencies only.
So, some of the topic results from Classifier for organizations are :
- Cucumbertown : Django, Gearman, Email, Commit, Notifier
- Exotel : Audio, IVR, Music, SMS
- Flipkart : REST API's, MySQL, lucence, Redis, HTTP proxy, load balancer
- Freshdesk : Databases, Rails, API, Websockets, Socket.io, YUI, Resque
- HasGeek : Workshop, Lastuser, App management, TV, Job, GitHub
- HackerEarth : Django, API documentation and clients, extensions and editors
- Instamojo : API clients, Wordpress, Frameworks, Huxley
- Myntra : iOS, Cocoa, Android, ElasticSearch, Docker, Librato
- MySmartPrice : Technology Blog, Gearman workers, Cookbook
- Practo : OpenID, Flask, Sentry, Symfony, Raven, Mail clients, Messages
- ShepHertz : App42, PaaS, SDK, API clients, MongoDB, MySQL, Redis
- WebEngage : Message, API, Website, Speech
- Wingify : Angular.js, DOM, RabbitMQ, iOS, Data, Bootstrap, VWO
Here we can see that Flipkart's stack includes things related to distributed computing, Networking, Databases on the other hand Wingify's stack includes things related to Frontend, Data, Networking.
If you're thinking that santa helped me in all this; then you are wrong, my friend. I was all alone everytime, thinking about it, collecting the data, managing R source files in Rstudio, writing Python for it and all that.
If you're feeling that you can do something much more awesome than this.
You can do whatever you want; It's hosted on GitHub, pravj/ospi.