Skip to content

Google Summer of Code 2018 Ideas

Shekhar Prasad Rajak edited this page Mar 22, 2018 · 29 revisions

Ideas for Google Summer of Code 2018.

Contact

Feel free to reach us by joining #sciruby on chat.freenode.net or via our mailing list.

IMPORTANT NOTICE: SciRuby encourages diversity. Scientific progress in general benefits from diversity and software development for science is no exception. We are really happy that the number of people from Asia, Africa and South America applying for GSoC projects is increasing. Our org admin this year is from India, our previous org admin was from Brazil. We have had students from Japan, India, Sri Lanka, Russia, etc. We have women software developers in our programme. We are happy to hear from you all!

Instructions for students

We strongly recommend that you pick one of the ideas listed below. We value contributions in advance of GSoC, even if they're just little ones. Go pick out something in one of our trackers and work on it, talk to folks on the listserv, and get an idea for what features are needed.

You don't need to know a lot about Ruby to work on a project: depending on how much you already know, it'll be pretty easy to learn enough to be able to contribute. However, you may need some familiarity with scientific computation. If you don't have any, take a look at "Numerical Recipes in C", which you'll probably find in your university's library.

In any case, if you feel your skills aren't enough for some project, please ask us on our IRC channel (see contact section above) or our Google Group (see sciruby.com to sign up) and we can help you.

See also:

Read this before you commit your first patches

Most of the main SciRuby’s landing page on Github holds the stable version of SciRuby gems but developers and contributors should work on the very latest (bleeding edge) repositories in order to make sure that changes can be committed without conflict arising.

Try reading Finding The SciRuby Development Repositories on Github if you would like a brief introduction on finding the latest development gems to work on from Github. Also go through the coding guidelines before sending your first patch.

How to submit a patch ("pull request")

Here's a great tutorial: http://www.thinkful.com/learn/github-pull-request-tutorial/

Have a look and feel free to ask if you have any questions.

Instructions for mentors

Guidelines for mentors to submit projects:

  • Specify the name of your project as a heading.
  • Write a paragraph or two with further details.
  • Write a small 'Skills' section detailing the skills that the student must possess to complete the project.
  • Write down your own GitHub handle and contact details in a 'Mentor Details' section over which the student can contact you.
  • If anyone else wants to co-mentor a project, please specify your details along with the mentor's details.

Project Ideas

Visualization projects

Ruby Matplotlib

There exist several plotting libraries for Ruby, but none of them is as readily usable as either Matlab's plotting system or the Matlab-inspired Python Matplotlib. This lack of a matlab/matplotlib-compatible Ruby plotting library is Ruby's single biggest obstacle as a scientific library.

This project is longer term than just one GSOC, but the Ruby Science Foundation is prepared to allocate funds to provide an ongoing grant for development of such a library.

Several approaches have been discussed:

  1. Provide a Ruby API for the C code in Python matplotlib. This approach has been considered in the past, but is almost certainly infeasible, as matplotlib is extremely tightly coupled with Python.
  2. Provide a Ruby-to-Python bridge to expose Python matplotlib in Ruby. The largest problem with this strategy is the challenge of debugging across three languages. Suppose a plot call doesn't work; is the problem with the underlying C code, the Python code which accesses the C code, or the Ruby–Python interface?
  3. Rewrite matplotlib in a language-independent form and expose it to Ruby. This is an enormous task, but a key advantage is the language independence; other languages' communities could provide APIs as well, and thus provide some of the locomotive force for development of such a language-independent tool. This would likely need to be attempted in C++ rather than C (due to ready availability of data structures). It is an enormous task.
  4. Write a Ruby matplotlib from scratch. This approach should be significantly easier than the language-independent matplotlib, but may not have as broad of appeal. It allows you to use Ruby's native data structures (or possibly NMatrix) for storage, thus relying on underlying C (and/or Java code) so you have less rewriting to do.
  5. Other? Perhaps you can think of something we've missed. Remember that GSOC with SciRuby is about researching and presenting the best solution, and then following through on it; and collaboration with others.
  • Mentors: John Woods (@mohawkjohn) for approach 3, 4, or 5.
  • Co-mentors: Rakhi Sharma (@rakhisharma)
  • Recommended skills: C/C++
  • This project may be able to accommodate multiple students with proven teamwork skills.

Advance features in daru-view

Learn basics of daru-view, from sciruby/blog or daru-view/wiki.

One of the good feature daru-view provide is, we can use all the options, that plotting tool already have; by accessing chart/table class using #chart or #table. But we are using google_visualr, lazy_highcharts gem to actually access the features present in Google charts JS, HighCharts. These dependent gems are not able to access all the features and also Google chart tool, HighCharts, DataTables are being updated. So we need to update/add methods to use in daru-view.

Google charts tool

  • Since google charts have more features now and developers are keep updating it. It is important for us to extend google_visualr gems code in daru-view, i.e. adding more methods in daru/view/adapter/googlecharts and use the features of the google chart js directly. (Contributor can come up with more ideas by searching in google chart site)

  • DataView, ChartWrapper class, ChartEditor class; need to be implemented in daru/view/adapter/googlecharts.

  • Export the chart in various format: In google chart we can get the image url using chart.getImageURI() , then we can download the chart in different format. Refer this link. Means we want to download the chart in various format directly from the code.

HighCharts

  • Similarly Highcharts js are updated in official site and more features are being added. So we must keep updating our daru-view gem adapter/highcharts; i.e. extend the lazy_high_charts code in daru-view. (Contributor can come up with more ideas by searching in highcharts site)

  • Currently daru-view is just using Highcharts JS but we can use more features using HighMap and HighStock. Implementation of these features in daru/view/adapter/highcharts will be very useful. Also it will be usable offline. (Since in daru-view Highcharts adapter can be used fully offline)

There are many examples present on the Highcharts -> HighStock site and some advance examples in blog as well (like this). Can we do something like this using daru-view ?

  • Custom styling CSS in HighCharts. Refer.

  • We can export the chart in various format (like jpg, png, svg, pdf) when we see it in browser (using iruby notebook or web app, we can see the download button at the right corner if export module is loaded). So there must be an api(like chart.export_png, chart.export_pdf, chart.export_svg, ..) using which we can directly download the chart from the code.

DataTables

daru-data_tables is created only to use it in daru-view gem. But daru-data_tables gem is not fully completed. Currently it can display table in web applications but not iruby notebook. Refer this link.

We want to load large set of data piece by piece as I discussed it here. But it is still not implemented.

Refer this blog post for more info.

There are many features we can add in daru-view/DataTables.Contributor must come up with more ideas.

Updating js files

  • Google chart JS and highcharts js must be updated whenever user want. These dependent js files are updated in the official links. Refer this issue.

  • To make daru-view workable offline, it is loading the JS files in iruby notebook and web application. But when you see the html source code, you see bunch of lines at the source html file because of js. We must have something like this

//= require daru-view/highcharts/highcharts
//= require daru-view/highcharts/highcharts-more
//= require daru-view/highcharts/highstock
//= require daru-view/googlecharts

to load js files. Refer this comment.

Note:

Contributor may need to understand the codebase of nyaplot, googlevisualr, lazy_highcharts, daru-data_tables to extend the feature. May be in near future we will remove these dependent gem(by adding all our required code into daru-view only) and use only daru-view code.

I am not expecting changes in dependent gems, since our requirement is to make it usable with daru gem and in iruby notebook as well as ruby web applications.

  • These are good to start with:
  1. daru-view/issues/67
  2. daru-view/issues/66
  3. There are many examples present in GoogleCharts and HighCharts site, that need to be present in spec/dummy_iruby examples. More examples will lead us to know more features and bugs present if any. Refer.

Questions

  • Can we create advance charts like this by extending the daru-view/highcharts code? (i.e. creating methods. How will the data be send? How those options will be set to modify the charts?)

  • Is it good idea to have some plugin (plugin in the sense, user can add more available adapters for plotting, using some command like daru-view add_adapter abc_charts)? Then user will be able to use it. I don't think it is good idea to have many adapters in daru-view itself. It must be easy to extend, when required.

In developers point of view, using rake task developers must be able to generate template for the new adapter xyzAdapter (So the rake task will generate daru/view/adapters/xyzAdapter.rb and for helping methods folder will be generated daru/view/apaters/xyzAdapter) gist link ,a template for the xyzAdapter.rb file.

Related links

About project

  • Skills: Basic knowledge of Ruby, Javascript and Ruby web application frameworks.
  • Mentor: Victor Shepelev (@zverok), Sameer Deshmukh (@v0dro), Shekhar Prasad Rajak (@Shekharrajak)
  • Difficulty: Moderate.

Red Visualizer

https://github.com/red-data-tools/red-visualizer

About this project

This new project aims to make a new high level data visualization system that helps Rubyists understand their data well, by letting them work seamlessly with both the data and its visual representation.

With Red Visualizer, instead of building charts using a plotting library directly, you first describe your data for making it visualizable, then you specify additional metadata for realizing the visualization details.

The concept of Red Visualizer is borrowed from Python's HoloViews and Julia's Plots ecosystem.

The goal of this project is supporting multiple data sources and multiple plot backends. We want to support the following data sources:

  • Plain collection objects:
    • Array
    • Enumerable
    • Hash
    • PyCall::List (Python's list through pycall)
    • PyCall::Dict (Python's dict through pycall)
    • etc.
  • Data frames
    • Daru
    • Red Arrow
    • pandas's DataFrame through pycall
    • etc.
  • Numerical arrays:
    • Numo-NArray
    • NMatrix
    • GSL's Matrix
    • RMagic's Image
    • NumPy's array through pycall
    • etc.

And we want to support the following plotting backends:

This is the design concept image:

For GSoC 2018

In this GSoC period, we aim to realize basic functionalities of Red Visualizer and to support the following things:

  • Numo-NArray data source
  • NMatrix data source
  • Daru data source
  • numpy (pycall) data source
  • pandas (pycall) data source
  • Gnuplot backend
  • rbplotly backend
  • HoloViews (pycall) backend

GSoC Project information

  • Skills:
    • Some experiences with Ruby and Python (and Julia)
    • Basic understanding of data visualization
    • Familiarity with the following things
      • Numo-NArray
      • NMatrix
      • Daru
      • pycall
      • numpy
      • pandas
      • Gnuplot
      • Plotly
      • HoloViews
  • Mentors:
    • Kenta Murata (@mrkn): CRuby committer and PyCall developer.
  • Co-Mentors:
    • Kouhei Sutou (@kou): CRuby committer, Red Data Tools founder, and Apache Arrow PMC.
  • Disclaimer:
    • The mentors and the co-mentors are living in JST timezone, so it is better a student also lives in the same timezone.

Numerical projects

Releasing CRuby GIL with Rubex

The GIL inhibits true multi-threading in Ruby. Rubex can be used as a medium for easily releasing the GIL and creating native threads.

The GIL releasing interface of Rubex will have two ways of interaction:

  • Have a block named no_gil that will contain code that should be executed without the GIL.
  • Have C functions that will be tagged with no_gil and will run like that when invoked.

The second approach is very simple since it mainly involves making some syntax modifications to the C function. Basically this will involve augmenting the normal cdef function defining syntax with a no_gil keyword. For example:

#@no_gil
cdef function_without_gil 

end

The first approach is more complex to implement and also easier to use from a user perspective. This will involve using a syntax similar to a Ruby block for releasing the GIL. This block will have special evaluation features from the side of the Rubex compiler. Syntactically it will look like so:

def small_compute
  int i = 0
  int j = 0
  no_gil do 
    for j < 100
      j += i
      i += 1
    end
  end
  
  return j
end

The above function will call rb_thread_call_without_gvl() from the Ruby C API or other similar functions and execute the contents of the block without the GIL. This functionality will first require the implementation of Ruby blocks, and most notable, closures in Rubex since the block must be able to access the variables of the calling function. It will be possible to call the native C pthreads API using this functioanlity.

Both the above method will NOT allow use of Ruby objects inside the no_gil part. It will be an exclusive C zone.

Native CUDA kernels with Rubex and RbCUDA

Similar to native CUDA kernel support in Julia, we should have support in Ruby.

Since it is tough to augment the Ruby VM to support this, it can be done in an easier way using Rubex and RbCUDA. For example, a sample Rubex method that would compile to a native CUDA kernel can be defined with cudadef and written like this:

require 'rbcuda_native'
require 'rbcuda'

include RbCUDA::Driver

cudadef kernel_vadd(a, b, c)
    i = threadIdx().x
    c[i] = a[i] + b[i]
    return
end

# generate some data
len = 512
a = rand(len).to_i
b = rand(len).to_i

# allocate & upload to the GPU
d_a = GPU_Array.new(a)
d_b = GPU_Array.new(b)
d_c = GPU_Array.new(d_a)

# execute and fetch results.
kernel_vadd(d_a, d_b, d_c) with cuda(1,len)
c = d_c.to_cpu_array
  • Skills: C/C++, CUDA, Ruby, Ruby C API, parallel programming, familiarity with design of compilers.
  • Mentor: Sameer Deshmukh @v0dro, Prasun Anand @prasunanand
  • Difficulty: Moderate.

NMatrix projects

NMatrix is SciRuby's numerical matrix core, implementing dense matrices as well as two types of sparse (linked-list-based and Yale/CSR). NMatrix is a fairly well-established project which has received Summer-of-Code-like grants from both Brighter Planet and the Ruby Association (in other words, from Matz, who created Ruby). Those who contribute to NMatrix will likely eventually become authors of a jointly-published peer-reviewed science article on the library. Additionally, NMatrix is a good place to gain practical C and C++ experience, while also working to improve Ruby.

NMatrix currently relies on ATLAS/CBLAS/CLAPACK and standard LAPACK for several of its linear algebra operations. In some cases, native versions of the functions are implemented, so that the libraries are not required. There are quite a number of areas for growth in terms of the capabilities of NMatrix here.

Speed up element-wise operations in NMatrix

  • Mentors: John Woods (@mohawkjohn) , Prasun Anand(@prasunanand)
  • Per this discussion, constraints of the Ruby language currently slow down element-wise addition and subtraction for NMatrix objects. There are possibly some work-arounds, described in that email thread. A successful proposal would involve some preliminary research and design work on how to speed up element-wise operations.
  • Recommended skills: Some C/C++ would be beneficial, as you'll need to be working under the hood on NMatrix.

Supporting RMagick in NumBuffer

NumBuffer is a library that provides an abstraction layer of numeric arrays on memory. It is currently under development to support Numo::NArray and NMatrix. In this GSoC project, a student will try to add a support of RMagick in NumBuffer so that we can read an image memory as a numerical array such as Numo::NArray.

  • prerequisite: experiences of C/C++ programming, Numo::NArray, NMatrix, numpy, ImageMagick
  • mentor: Kenta Murata (@mrkn)
  • difficulty: middle
  • disclaimer: The mentor is living in JST timezone, so it is better a student also lives in the same timezone.

Daru and general Ruby projects

  • Mentors: Victor Shepelev (@zverok);
  • Co-mentors: Athitya Kumar (@athityakumar), Shekhar Prasad Rajak (@Shekharrajak);
  • Recommended skills: some (may be very small) experience with Ruby/Rails ecosystem, understanding or readiness to understand what other (non-scientific) Rubyists love and want

Business Intelligence with daru

Come up with your own ideas for Business intelligence applications with daru. It can be especially suitable for those types of software:

  • Reporting and querying software. Think library with daru inside and business-ready DSL outside, like "fetch something from DB and prepare several reports and export this to spreadsheet".
  • Digital dashboards (represent some data with lot of ways, tables, visualisations and so on)
  • Data cleansing (see below as a separate point)

Data cleaning library

Data pre-/post-processor for daru, akin to janitor R toolset: finding and dropping problematic rows and columns, getting rid of outliers, recoding wrong column types and so on and so force.

Ruby/Rails data analysis tools

In order to be closer to "general" Ruby developers, we could work on daru-based load-analyse-process-visualise data tools for such kinds of information as:

  • Rails logs;
  • ruby-prof and other measuring tools output;
  • ...

The project in this area should go as a library, which:

  • uses daru, daru-io, daru-view;
  • can load data from specified format (say, Rails logs);
  • has a set of easy-to-use, already set up visualisation and grouping/analysing routines, producing useful and understandable results;
  • includes set of demos, including stand-alone scripts, integrateable into web-framework dashboards and IRuby notebooks, showcasing the usage and utility of the library.

NetworkX.rb

  • Mentors: Sameer Deshmukh (@v0dro), Athitya Kumar (@athityakumar). Mridul Seth (@MridulS);
  • Recommended skills: Some experience with Python and/or Ruby, a basic understanding of graphs, and familiarity with the networkx library.

A network analysis and graph library for Ruby, based on the NetworkX library of Python. It is intended to handle various use-cases of the Graph Data Structure. The different types of classes to be implemented are,

  • Graph (or, undirected graph)
  • DiGraph (or, directed graph)
  • MultiGraph
  • MultiDiGraph
  • Weights and other parameters of an edge are supposed to be specified as keyword arguments.

Each of these graph classes has

  • enumeration facility to do something like graph.each_node { |n| puts n }
  • set of manipulation functions such as add_edge, add_node, etc.
  • set of algorithms like BFS, DFS, etc.
  • set of analysis functions such as cardinality, diameter, etc.
  • set of IO methods (establish a bridge between NetworkX graphs and Daru DataFrames, and then use daru-io?)
  • set of plotting methods and View Helpers for usage in web applications (similar to daru-view)

The approach currently being considered for all graph classes is the nested Hash data structure. However, if you feel there's a better way to handle the data, feel free to suggest it here and/or in this mailing thread. The nested Hash used internally, looks like the below -

graph_hash
#-> {node_1: {node_2: {weight: 5}}, node_2: {node_1: {weight: 3}}}

graph_hash[:node_1]
#=> {node_2: {weight: 5}}

graph_hash[:node_1][:node_2]
#=> {weight: 5}

CUDA kernel fusion with Cumo

Similar to CUDA kernel fusion support in CuPy, we want tot support it in Ruby.

CuPy's kernel fusion is still limited to elementwise kernel. We should consider how to support reduction kernel, rather how to fuse with kernels of other libraries such as cuDNN. CUDA LLVM Compiler or NVVM are potential choices. You have to investigate how to achieve CUDA kernel fusion.

  • Skills: C/C++, CUDA, Ruby, Ruby C API
  • Mentor: Naotoshi Seo @sonots
Clone this wiki locally