Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Access Layer #161

Open
jeevatkm opened this issue Apr 7, 2018 · 11 comments
Open

Data Access Layer #161

jeevatkm opened this issue Apr 7, 2018 · 11 comments
Assignees
Labels
Projects

Comments

@jeevatkm
Copy link
Member

@jeevatkm jeevatkm commented Apr 7, 2018

aah's goal of Data Access Layer as follows-

  • Support RDBMS and NoSQL data sources
  • Seamless usage of multiple data sources (be it RDMBS or NoSQL) and having one as default data source
  • Data source aware model/entity, so that aah user doesn't have to deal with hard part targeting different data source.

I will do my analysis first then planned out implementation part.


Data Access scope is vast, I'm gonna go by step by step

  • Create foundation
  • Mapping/Binding Result Sets into struct
  • CURD style operations
  • Persistence events/hooks
  • Multiple Database support (to begin with MySQL & Postgres)
  • Multiple DataSource support and connection management
  • Prepared Statements with Named Arguments binding from struct, map
  • Native query execution
  • Query Cache in-memory to begin with

Upcoming List

  • Database tables to struct Model generation type safe
  • Datbase Migration (up & down) with audit-trail
  • Transactions
  • Data Source aware models
  • Debug logging (first release would have basic level)
  • Filters/Processors
  • etc
@jeevatkm jeevatkm created this issue from a note in aah Roadmap (Backlog) Apr 7, 2018
@jeevatkm jeevatkm moved this from Backlog to v0.11.0 - Iteration in aah Roadmap Apr 7, 2018
@jeevatkm jeevatkm added this to the v0.11.0 Milestone milestone Apr 7, 2018
@jeevatkm jeevatkm added the feature label Apr 8, 2018
@broklyngagah
Copy link

@broklyngagah broklyngagah commented Apr 10, 2018

I use gorm in all of my RDBMS projects. +1 for gorm..

@jeevatkm
Copy link
Member Author

@jeevatkm jeevatkm commented Apr 10, 2018

@broklyngagah Thank you for your inputs.

I'm currently in the process of figuring out and provide meaningful design and usage - As you know
"Modern application uses RDBMS as well as NoSQL".

@blagodus
Copy link

@blagodus blagodus commented Apr 12, 2018

@jeevatkm you can look at these packages for inspiration(integration):

Do you plan to build new package or just integrate existing?

@jeevatkm jeevatkm changed the title Database Access Layer Data Access Layer Apr 12, 2018
@jeevatkm
Copy link
Member Author

@jeevatkm jeevatkm commented Apr 12, 2018

@blagodus Thank you for your inputs, I will surely look into it. It seems in the initial write up my goals didn't come out properly. I have just update it.

Answer to your question, I have not yet decided the approach whether new package or integrating existing. I will be doing my analysis inline with goals of this implementation.

Once analysis is done, I would publish my direction on this thread.

@AugustHell
Copy link

@AugustHell AugustHell commented Apr 25, 2018

Seamless usage of multiple data sources (be it RDMBS or NoSQL) and having one as default data source

You might have a look on the awesome CockroachDB. It is written in go, incredible fast, made for scaleability and it can be connected thru postgres-adapters. JSON types makes it interesting for NoSQL users too.

@jeevatkm
Copy link
Member Author

@jeevatkm jeevatkm commented Apr 25, 2018

@AugustHell Thank you for your inputs.

@AugustHell
Copy link

@AugustHell AugustHell commented Apr 26, 2018

The database layer is a big topic in my eyes. Most frameworks struggle on that part or even touch it.
In the go world I haven't find yet a really good one.
Some thoughts:

Multiple Connection Handling

As write operations are the heavy impacts for any database, a solution for high traffic sites is to have a pure read and a read/write connection to two different servers that are handling the sync on their own.
Another real life example are geo located connections to serve data from the nearest db server to keep latency low.

Permanent and Temporary Connections

The number of db connections is most time limited, so a permanent connection seems great, but is not in any case. A developer should have the freedom of choice to use what he needs.

Database Modeling

I really like to have a structured model for each db table I use. In projects you need a lot of tables, you get quick bored creating them. The helping methods I know of, are creating thru command line scripts, auto migration like in gorm, or extracting from the database's "SHOW TABLE".
While some databases have great modeling tools, like eg MySQL Workbench, others don't. Hence offering only one way of modeling is limiting it's use.

Topic based Migrations

When it comes to migrations the most seen solution is a time based one with up/down scripts. In my opinion, this is not really practicable as when developing in a team on a bigger project not every one is focused on the same tables, Soon there are a lot of up and downs you have to do if you want to revert one manipulation. A table based versioning reflecting the table relations would be nicer.

Multiple Query Cache

While most databases have a query cache, table joins are sometimes not the most perfomant solution. So you hack your data needs with serveral querys together. Putting that in a cache can speed up the app immense. So if it comes to a query builder, some kind of transaction block which aim is not the data integrity but the caching of the output would be nice to have.

That's not a complete list, just what came to my mind at the moment.

@jeevatkm
Copy link
Member Author

@jeevatkm jeevatkm commented Jun 8, 2018

Update: My homework and analysis are progressing well on data access layer, however I need further time to start foundation. So pulling this one out from v0.11.0 and making v0.11.0 Milestone release.

Thank you very much for the support and understanding.

@jeevatkm jeevatkm removed this from the v0.11.0 Milestone milestone Jun 8, 2018
@jeevatkm jeevatkm moved this from v0.11.0 - Iteration to Backlog in aah Roadmap Jun 9, 2018
@jeevatkm jeevatkm self-assigned this Jul 14, 2018
@jeevatkm
Copy link
Member Author

@jeevatkm jeevatkm commented Jul 14, 2018

I have updated with plan and pulled out from initial and adding it here for reference:


Following is initial leg work on RDBMS findings-

I have done my homework (read documentation, source code) around many data access layer libraries exists now. It turns out following is the viable candidates for integration as data access layer with aah.

@AugustHell
Copy link

@AugustHell AugustHell commented Dec 19, 2018

Over the past week I researched more about this topic, as I really need a better solution that are yet out there. My focus is a sql database (either cockroach with postgress driver or percona server with mysql driver), while keystores in memory like redis or bolt db are still in mind for heavy perfomance uses.

So here are some more thoughs for the data acess layer, i came accross:

Transactions and prepared statements can be a pitfall
Both depend on the connection they get executed and both are involving extra round trips to the database. However transactions are great for data integrety operations and absolutely need for them. Prepared statements are only useful for repeated queries on the same connection, if the database and the db driver supports them natively. Reason is, that the prepared statement has 3 round trips to the database. On the other hand repeated queries can get boosted with more db connections out of the pool, or by avoiding them by building queries that give back all the needed data at once. In a lot of cases, it's better to do the preparation of paramentarized queries in the program and send it to the db, instead of letting the database doing it.

Row Identifiers and unique id's
Using a sql db's auto increment id can lead to problems when backing up and restoring the data, as well as for concurency. A better way to identify entities/rows is by letting the program generate the uuid and past it as string. This avoids having a maximum number of id (in opposite of int32, which some db's still use). Even building id's by the program, leads to independency of the choosen data persistence solution. An approach for generating could be: https://github.com/sony/sonyflake which is based on time and machine id. As machines can crash and app portability is better, I would prefer using an application id instead of a machine id.

Controller, model and data access layer dependencies
Whatever persistence solution is used now, might not be the best in the near future. We've seen a lot of new approaches with kv-stores, nosql, cloud services, sharding and so on in the last years. Developing an application with a framework should not depend on the persistence used at initial design. Decoupling the data access layer from controllers and models makes it easier to test too. To get independent of the data source, the database/sql package defines interfaces and the data access layer should do it too. The base idea of that is reprensented in this: https://husio.github.io/blog/post/accessing-data-in-go/

Perfomance wise
Depending on the applications use case, access to the data can get heavily time consuming and is often the bottleneck of the app performance wise. While using a ORM like gorm feels great in developing, it's reflection and query builder is a real performance hit leading to multiply the data access times by factors. Another approach is using code generation at build time like https://github.com/volatiletech/sqlboiler or https://github.com/gnormal/gnorm. Well, the first time I've seen that, it didn't look very comfortable for me, but as more as I learn about whats going up behind what influence it has on being productive as well as writing a perfomant application, my interest growed. As aah is already generating code, using parts of that idea might be worth to think of. For instance in data access controllers a query builder can help the developer, while the queries itselves are composed at buildtime, instead of building and executing them at runtime.

@jeevatkm
Copy link
Member Author

@jeevatkm jeevatkm commented Dec 19, 2018

@AugustHell Thank you for your inputs. I will utilize your insights while design and implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
aah Roadmap
  
Backlog
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants