Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unique elastic index for each table #148

Closed
measwel opened this issue Jan 15, 2018 · 11 comments
Closed

unique elastic index for each table #148

measwel opened this issue Jan 15, 2018 · 11 comments

Comments

@measwel
Copy link

measwel commented Jan 15, 2018

Is it possible to have the driver create a separate, unique index for each table indexed with elastic search? I am asking, because currently I run into problems with fields which have the same name in different tables. Elastic search does not allow to map equally named fields in different mappings within the same index. The solution is either to rename the fields ( costly ) or to have separate indexes per table...

There is one more possibility; to store all entities in the same cassandra table, differentiate based on 'type' and store diverging collumn data into a stringified text field. This would be a weird solution and also costly in terms of refactoring.

@measwel
Copy link
Author

measwel commented Jan 16, 2018

My suggestion would be the following:

      elasticsearch: {
        apiVersion: '5.5',
        sniffOnStart: true,
        host: config.elastic_server,
        requestTimeout: 60000,
        indexPerTable: true <== create a separate index per cassandra table
      }

indexPerTable:
true: create indexes called keyspacename_tablename
false: create just one index called keyspacename ( default )

If indexPerTable is true, the driver should query the correct index automatically based on the cassandra table name of the associated database model.

This solution would eliminate the problem of clashing field names in different entities - which is quite a big problem if the data model is large.

In my case, I have 2 entities which both have a 'location' field. With just 1 index I would have to rename the field in one of the entities. The problem is - besides refactoring the application - that I currently query both entities and merge the results in 1 response. This allows me to use the 'location' property in the frontend regardless of the underlying object. In other words, I can treat all returned objects the same way when displaying the location field.

@measwel
Copy link
Author

measwel commented Jan 17, 2018

Dear Masum,

I have started to rework the driver to support multiple ES indexes. I will let you know when its ready or when I get stuck - whatever happens first :) If I can do it, I will give you the code.

UPDATE: my changes seem to work fine :) Now each table gets its own index and the problem with conflicting field names is gone!

I will test tomorrow.

@measwel
Copy link
Author

measwel commented Jan 18, 2018

Dear Masum,

Attached you will find the code of the modified driver. I have marked the modified places with comments starting with: // MOD

The default in this implementation is to create separate ES indexes for each model which defines an ES mapping. The reasoning is:

  • The driver searches per table anyhow.
  • It is easy to query ES over multiple indexes simultaneously if needed.
  • Smaller indexes should be faster and less prone to crashing.
  • Having separate indexes allows mapping equally named fields in multiple tables without conflicts. This is allows one to return results from multiple tables and use the same field names to process the data in a concise way.

This change is transparent to the user. Nothing changes in the way the driver functionality can be used. To go back to creating just one ES index for all models one can set:

ESindexPerModel: false

In ORM options.

PS Please let me know if you decide to integrate this change into the latest driver version, so I can update and use the official package. You can reach me at marek_karczewski@yahoo.com.au

express-cassandra.zip

@masumsoft
Copy link
Owner

masumsoft commented Jan 19, 2018

Hi, sorry for a delayed response. I was really busy in the meantime. I’ll try to have a look at it. But it would be great if you forked the repo, apply your modifications and sent a pull request instead. That would help me figure out the diff easily on github.

@measwel
Copy link
Author

measwel commented Jan 19, 2018

I cloned the repo and created a branch in which I committed my changes. Please provide further instructions. How should I push and create the pull request?

@masumsoft
Copy link
Owner

git push origin your_branch_name

And then...

https://help.github.com/articles/creating-a-pull-request/

@measwel
Copy link
Author

measwel commented Jan 19, 2018

I dont have write access to the main repo. I found the fork option. Will apply on fork, then push.

@measwel
Copy link
Author

measwel commented Jan 19, 2018

SourceTree is giving me errors when pushing the branch, but the push seems successful; the branch is in the 'measwel' fork. I have created a pull request.

I am proud of this change :) I hope you will like it and integrate it into the main branch, so I can use the official package.

@measwel
Copy link
Author

measwel commented Feb 2, 2018

Dear Masum,

I have tried several times to upload the code to github. For some reason I am getting upload errors. Attached are the 3 modified files. I turned off auto-formatting for them, to minimize spacing issues. Can you please please look at them and if approved, integrate them into the main branch? You will see that the changes are actually quite minimal; multiple indexes are made per table instead of 1 index for all tables. I think this is a valuable functionality upgrade.

Thank you greatly for the wonderful driver!

ESindexPerModel.zip

@masumsoft
Copy link
Owner

masumsoft commented Feb 3, 2018

Integrated the changes with some modifications applied to it. The update is available in v2.2.0

@measwel
Copy link
Author

measwel commented Feb 3, 2018

Great news! Thank you so much. Judging from the new code index per table is now the new default :) Fantastic!

I have just updated the driver to the latest version and everything seems to run fine 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants