Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare various SQL database implementations #579

Closed
jeff1evesque opened this issue Nov 16, 2014 · 7 comments
Closed

Compare various SQL database implementations #579

jeff1evesque opened this issue Nov 16, 2014 · 7 comments

Comments

@jeff1evesque
Copy link
Owner

We will research the best SQL database implementation for both the SVM related data, and its corresponding web-interface (i.e. user administration).

Important aspects to consider, include whether a given database is supported by various common content management systems (i.e. drupal, wordpress), provides a phpMyAdmin equivalent, supports both php and python, and scales well for large projects.

@jeff1evesque
Copy link
Owner Author

SQLite is generally known to perform well for smaller applications:

  • easy setup, which allows for faster development
  • does not have user management
  • very little performance features
  • doesn't scale well, especially for enterprise level applications.

MySQL is generally known to perform well for larger applications:

  • provides user management
  • more complicated to set up
  • more optimization options
  • can scale well if properly tuned

@jeff1evesque jeff1evesque changed the title Research database implementation Compare various SQL database implementations Nov 20, 2014
@jeff1evesque
Copy link
Owner Author

IRC #postgresql (11/20/14 ~ 7:10pm EST):

jeffreylevesque: how does mysql compare to postgres?

lluad: Postgresql will keep your data safe, and do it's best to help you avoid putting garbage in your database.
lluad: Mysql won't.
lluad: Postgresql handles concurrency much better than mysql or sqlite.
lluad: Many of the other postgresql advantages are things like custom types, support for complex SQL efficiently and so on - and you throw all those away any time you try and support different databases from an app.

jeffreylevesque: i'm on #drupal, and they are saying the CMS will run postgres

lluad: Beyond that it's something you really need to use postgresql to do what you want to do and see how it works out for your particular problems.

jeffreylevesque: which is encouraging, in case I decide the cms

lluad: Yup, it will.

jeffreylevesque: does postgresql provide a phpmyadmin equivalent?

lluad: There's phppgadmin.
lluad: It's ... not very good. (But nor is phpmyadmin).
lluad: There's also pgadmin3 - amongst many others - for GUI clients.
lluad: The commandline client, psql, is really quite good.

IRC #drupal 11/20/14 ~ 7:10pm EST):

jeffreylevesque: anyone know if drupal will run on postgres database?

KCowboy: jeffreylevesque, After you download drupal, go to yoursite.co/install.php and it will prompt you to setup the db
KCowboy: jeffreylevesque, One of the options is postgres, so you should be fine

cweagans: jeffreylevesque, I'd recommend using MySQL. Pretty much every Drupal site around uses it, so you know it will work well. Postgres isn't used quite as widely, so there may be some weird edge-case type things you run into. In theory, it's as simple as selecting the Postgres database type, but in practice, it likely won't work out like you hope.

jeffreylevesque: anyone know if drupal supports mariadb?

cweagans: It does.
cweagans: mariadb can be substituted pretty much anywhere you'd normally use MySQL.
cweagans: with no problems.

IRC #python 11/20/14 ~ 8:10pm EST):

therealfakemoot: jeffreylevesque, postgresql is a vastly superior solution to mysql

jeffreylevesque: I want something that scales well big, dynamically. Possibly be used across several CMS's

therealfakemoot: postgres

ssbr_at_work: never ever mysql
ssbr_at_work: if you're tempted by mysql, then mariadb

jeffreylevesque: howabout postgresql vs mariadb?

therealfakemoot: jeffreylevesque, here's the hierarchy: use sqlite if you do not need concurrent writes. use postgres for everything else.

IRC ##php 11/20/14 ~ 8:50pm EST):

jeffreylevesque: how does mariadb, postres, mysql fair in php, and in general?

iateadonut: mariadb is an opensource fork of mysql. i haven't seen how development compares to mysql since it split.

@jeff1evesque
Copy link
Owner Author

The following indicates language support across the various databases:

MariaDB PostgreSQL MySQL
Ada .NET Ada
C C C
C# C++ C#
C++ Java C++
D Perl D
Eiffel Python Eiffel
Erlang Tcl Erlang
Haskell Haskell
Java Java
Objective-C Objective-C
OCaml OCaml
Perl Perl
PHP PHP
Python Python
Ruby Ruby
Scheme Scheme
Tcl Tcl

@jeff1evesque
Copy link
Owner Author

MariaDB generally outperforms MySQL, supports more languages than PostgresSQL, and is an acceptable database platform for common content management systems such as Drupal. Also, like MySQL, MariaDB supports phpMyAdmin.

Since this project implements both php, python, and may consider a content management system (i.e. drupal), MariaDB will be attempted.

Note: MariaDB can be implemented on the Raspberry Pi.

@matheusoliveira
Copy link

I saw some of your comments on IRC. And you are completely wrong about language support, I think all those languages must have PostgreSQL drivers, and I know for sure that at least the following do: Java, Perl, Python, Ruby, Tcl, C#, Node-JS, Go. In fact, in PostgreSQL about page is listed, and I quote: "There are interfaces for Java (JDBC), ODBC, Perl, Python, Ruby, C, C++, PHP, Lisp, Scheme, and Qt just to name a few". So checking which language both support is just pointless, as both support the most languages out there.

Also, it supports all (and a lot of others) languages as procedural languages inside the database. Which is great as many machine learning libraries are implemented in languages such as Python or R and you can use those inside the database (specially madlib, which is Python and specific for PostgreSQL). Still for machine learning, PostgreSQL has a powerful support for arrays and index those very well with GIN and GiST, and can perform KNN search out of box.

Disclaimer: Clearly I'm a PostgreSQL guy, so my comments are biased. Although before you actually choose one, I recommend you testing those in practice with your application (it is even OK to support them all for some time during the development), you should learn at least the basics of which and do some more advanced work (guys on IRC can help a lot), and test performance too with different database sizes (but keep in mind you might need some tuning on them for optimal performance), I surely did that before noticing that I wanted to be a "PostgreSQL guy", ;-).

@willfong
Copy link

This is really a worthless comparison. It's the same as comparing a Ford to a Chevy, a screwdriver to a hammer, McDonalds to Burger King, Christianity to Buddhism. There is no "best".

Stated your requirements, and use the product that fits that description.

@jeff1evesque
Copy link
Owner Author

IRC #machinelearning 11/22/14 ~ 3:20pm EST):

zmjjmz: jeffreylevesque: sklearn is a good one to start with, easy to learn and has a bunch of stuff
zmjjmz: it's not the fastest though for some models, and can be fairly opaque to like add stuff to

jeffreylevesque: for large projects, how do most store the data associated with the library - textfile, or database?

zmjjmz: currently we're using hdf5 for about a 25GB dataset, IMO it's gonna be kind of a nightmare but we'll see
zmjjmz: largely depends on what you need to do with the dataset

jeffreylevesque: zmjjmz, what's hdf5?

zmjjmz: jeffreylevesque, http://en.wikipedia.org/wiki/Hierarchical_Data_Format

jeffreylevesque: zmjjmz, would you say it's faster and more optimized than traditional sql databases (i.e. postgres, mariadb)?

zmjjmz: jeffreylevesque: no
zmjjmz: however easier to access data from

jeffreylevesque: zmjjmz: easier to access, so a little bit faster?

zmjjmz: so sql will usually do a lot more work than is necessary for most of your ml tasks
zmjjmz: you can get away with a simple filesystem like structure since you're not doing like crazy joins and such usually
zmjjmz: though if your data is structured in a way that needs multiple tables I would use a real db -- but ultimately it'd be faster to develop and work with hdf5

RandIter: i think that the word hierarchical doesn't do enough justice to represent the benefits of HDF5
RandIter: esp. considering vgroups and such

zmjjmz: true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants