Skip to content
This repository

Batch insert update problem! #218

Closed
zhouhero opened this Issue March 18, 2013 · 21 comments

3 participants

zhouhero Vivek Mishra Amresh
zhouhero

I my use batch to insert data to 178 master tables, there was no exception and the action is succeed. But when i check the num of the records inserted int to tables , i find that some table's records number is not correct. I seems that , some data in the cach are not flush to the tables!

[my env]
kunder2.4 + cassandra1.2
kundera.batch.size = 5000

Can you tell me the where are the problems imposable?

Vivek Mishra
Collaborator

This should not be happening. I would suggest you to replicate this with entities which belong to those tables complaining about incorrect number of records. Please test and let me know if you get into any issues.

-Vivek

Vivek Mishra
Collaborator

Also, you may want to try:
em.flush() followed by em.close() for successful termination of em.

-Vivek

zhouhero

when i insert the data row by row without using batch, the record num are correct.
by the way, do close() after flush() at the end of the insert action like you suggested

Vivek Mishra
Collaborator

Hmm. Can you share your entity definition and test case ?

-Vivek

zhouhero

can you tell me how to debug data just before sending to cassandra server?

Vivek Mishra
Collaborator

There are two methods in CassandraClientBase.java :

1) onBatchLimit()
2) executeBatch().

Unfortunately they don't have any logging statements but you may add and try to debug it.

-Vivek

zhouhero

I find that if the two tables have the same pk , this problem will happend.

// test1 table insert

test1Entity1.setID(UUID.fromstring(13816710-1dd2-11b2-0000-242d50cf1ff7)) // uuid, pk
test1Entity1.setXX();
em.insert(test1Entity1);

test1Entity2.setID(UUID.fromstring(13818e20-1dd2-11b2-0000-242d50cf1ffc)) // uuid, pk
test1Entity2.setXX();
em.insert(test1Entity2);

test1Entity3.setID(UUID.fromstring(1381b530-1dd2-11b2-0000-242d50cf1ffc)) // uuid, pk
test1Entity3.setXX();
em.insert(test1Entity3);

test1Entity4.setID(UUID.fromstring(1381dc40-1dd2-11b2-0000-242d50cf1ffc)) // uuid, pk
test1Entity4.setXX();
em.insert(test1Entity4);

test1Entity5.setID(UUID.fromstring(13820350-1dd2-11b2-0000-242d50cf1ffc)) // uuid, pk
test1Entity5.setXX();
em.insert(test1Entity5);

test1Entity6.setID(UUID.fromstring(13822a60-1dd2-11b2-0000-242d50cf1ffc)) // uuid, pk
test1Entity6.setXX();
em.insert(test1Entity6);

// test2 table insert
test2Entity1.setID(UUID.fromstring(13816710-1dd2-11b2-0000-242d50cf1ff7)) // uuid, pk
test2Entity1.setXX();
em.insert(test2Entity1);

test2Entity2.setID(UUID.fromstring(13818e20-1dd2-11b2-0000-242d50cf1ff7)) // uuid, pk
test2Entity2.setXX();
em.insert(test2Entity2);

em.flush();

[the result]
the record num inserted to test1: 5 // 6 is right!
the record num inserted to test2: 2

Vivek Mishra
Collaborator

So test1 and test2 are entities on same column family or on different column families(i assume on different column families). Just to clarify, Batchsize is not specific to an entity but for complete application. So for example if batchsize is 5, then implicit flush will happen even if you create persist 3 entities for test1 and 2 for test2!
This is an expected behaviour. Same UUID should not be an issue? Am i clear on this?

-Vivek

zhouhero

first, test1 and test2 are diffrent tables.
then, my batchsize is 5000.
Yes, UUID can not be same in one table, but the same UUID can be in diffrent tables.

Vivek Mishra
Collaborator

Can you share your sample project to replicate this issue?
1) 2 entities pointing to different column families
2) Assigning same UUIDs to both entities and setting batch size to 5000.
And
Expected and actual result .

-Vivek

zhouhero

test1Entity1's ID is same as test2Entity1's ID,
when i change the test1Entity1's ID diffrent from test2Entity1 ,
the data is insert correctly to test1 table!

Vivek Mishra
Collaborator

Please share your entity definition and CRUD to replicate this at our end

-Vivek

zhouhero

[my table]
create columnfamily test1 (attraction_id uuid primary key, del_flg int);
create columnfamily test2 (give_id uuid primary key, item_id int);

[my entity]

@Entity
@Table(name = "test1", schema = XXXX)
@XmlRootElement(name = "Test1Entity")
public class Test1Entity implements Serializable {
private static final long serialVersionUID = 1L;
@Id
@Column(name = "attraction_id")
private UUID attractionId;
@Column(name = "del_flg")
private int delFlg;
...
}

@Entity
@Table(name = "test2", schema = XXXX)
@XmlRootElement(name = "Test2Entity")
public class Test2Entity implements Serializable {
private static final long serialVersionUID = 1L;
@Id
@Column(name = "give_id")
private UUID giveId;
@Column(name = "item_id")
private int itemId;
...
}

[my insert code]
// set batchsize= 5000 in persistence.xml
...
EntityManager em = entityManagerFactory.createEntityManager();

Test1Entity test1Entity = new Test1Entity();
test1Entity.setAttractionId(UUID.fromString("13816710-1dd2-11b2-0000-242d50cf1ff7"));
test1Entity.setDelFlg(3);
em.persist(test1Entity);

test1Entity = new Test1Entity();
test1Entity.setAttractionId(UUID.fromString("13818e20-1dd2-11b2-0000-242d50cf1ffc"));
test1Entity.setDelFlg(5);
em.persist(test1Entity);

Test2Entity test2Entity = new Test2Entity();
test2Entity.setGiveId(UUID.fromString("13816710-1dd2-11b2-0000-242d50cf1ff7")); test2Entity.setItemId(12);
em.persist(test2Entity);

em.flush();
em.close();

[the resut]
test1 table record num:1 // worng, 2 is right!
test2 table record num:1

zhouhero

I think you can appear this bug by the upper code.
please try it.

Vivek Mishra
Collaborator

Hi,
I confirm this is a bug and will get it fixed very soon.

-Vivek

zhouhero

thanks you for your help,
when and how can i get the fixed version ?
I am using kundera 2.2.1 now(i will change to 2.4 very soon).

Vivek Mishra
Collaborator

This may take couple of days(will be released with 2.5). However there is a workaround for this. If you insert test1 from one entity manager and test2 from another entity manager. I guess it should work.

-Vivek

Amresh
Collaborator
xamry commented March 26, 2013

This has been fixed and latest code has been pushed to trunk.

A new test case for batch insertion involving multiple entities has been added here:

https://github.com/impetus-opensource/Kundera/blob/trunk/kundera-cassandra/src/test/java/com/impetus/client/crud/batch/CassandraBatchProcessorMixedTest.java

Please test and let me know if you face any issue.

  • Amresh
Amresh
Collaborator
xamry commented April 17, 2013

Did you get a chance to verify this? Can we close this one?

Vivek Mishra
Collaborator

Releasing with 2.5

-Vivek

Vivek Mishra
Collaborator
mevivs commented July 09, 2013

Fixed and released with 2.5

Vivek Mishra mevivs closed this July 09, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.