Batch insert update problem! #218

Closed
zhouhero opened this Issue Mar 18, 2013 · 21 comments

Projects

None yet

3 participants

@zhouhero

I my use batch to insert data to 178 master tables, there was no exception and the action is succeed. But when i check the num of the records inserted int to tables , i find that some table's records number is not correct. I seems that , some data in the cach are not flush to the tables!

[my env]
kunder2.4 + cassandra1.2
kundera.batch.size = 5000

Can you tell me the where are the problems imposable?

@mevivs
Collaborator
mevivs commented Mar 18, 2013

This should not be happening. I would suggest you to replicate this with entities which belong to those tables complaining about incorrect number of records. Please test and let me know if you get into any issues.

-Vivek

@mevivs
Collaborator
mevivs commented Mar 18, 2013

Also, you may want to try:
em.flush() followed by em.close() for successful termination of em.

-Vivek

@zhouhero

when i insert the data row by row without using batch, the record num are correct.
by the way, do close() after flush() at the end of the insert action like you suggested

@mevivs
Collaborator
mevivs commented Mar 18, 2013

Hmm. Can you share your entity definition and test case ?

-Vivek

@zhouhero

can you tell me how to debug data just before sending to cassandra server?

@mevivs
Collaborator
mevivs commented Mar 18, 2013

There are two methods in CassandraClientBase.java :

  1. onBatchLimit()
  2. executeBatch().

Unfortunately they don't have any logging statements but you may add and try to debug it.

-Vivek

@zhouhero

I find that if the two tables have the same pk , this problem will happend.

// test1 table insert

test1Entity1.setID(UUID.fromstring(13816710-1dd2-11b2-0000-242d50cf1ff7)) // uuid, pk
test1Entity1.setXX();
em.insert(test1Entity1);

test1Entity2.setID(UUID.fromstring(13818e20-1dd2-11b2-0000-242d50cf1ffc)) // uuid, pk
test1Entity2.setXX();
em.insert(test1Entity2);

test1Entity3.setID(UUID.fromstring(1381b530-1dd2-11b2-0000-242d50cf1ffc)) // uuid, pk
test1Entity3.setXX();
em.insert(test1Entity3);

test1Entity4.setID(UUID.fromstring(1381dc40-1dd2-11b2-0000-242d50cf1ffc)) // uuid, pk
test1Entity4.setXX();
em.insert(test1Entity4);

test1Entity5.setID(UUID.fromstring(13820350-1dd2-11b2-0000-242d50cf1ffc)) // uuid, pk
test1Entity5.setXX();
em.insert(test1Entity5);

test1Entity6.setID(UUID.fromstring(13822a60-1dd2-11b2-0000-242d50cf1ffc)) // uuid, pk
test1Entity6.setXX();
em.insert(test1Entity6);

// test2 table insert
test2Entity1.setID(UUID.fromstring(13816710-1dd2-11b2-0000-242d50cf1ff7)) // uuid, pk
test2Entity1.setXX();
em.insert(test2Entity1);

test2Entity2.setID(UUID.fromstring(13818e20-1dd2-11b2-0000-242d50cf1ff7)) // uuid, pk
test2Entity2.setXX();
em.insert(test2Entity2);

em.flush();

[the result]
the record num inserted to test1: 5 // 6 is right!
the record num inserted to test2: 2

@mevivs
Collaborator
mevivs commented Mar 18, 2013

So test1 and test2 are entities on same column family or on different column families(i assume on different column families). Just to clarify, Batchsize is not specific to an entity but for complete application. So for example if batchsize is 5, then implicit flush will happen even if you create persist 3 entities for test1 and 2 for test2!
This is an expected behaviour. Same UUID should not be an issue? Am i clear on this?

-Vivek

@zhouhero

first, test1 and test2 are diffrent tables.
then, my batchsize is 5000.
Yes, UUID can not be same in one table, but the same UUID can be in diffrent tables.

@mevivs
Collaborator
mevivs commented Mar 18, 2013

Can you share your sample project to replicate this issue?

  1. 2 entities pointing to different column families
  2. Assigning same UUIDs to both entities and setting batch size to 5000.
    And
    Expected and actual result .

-Vivek

@zhouhero

test1Entity1's ID is same as test2Entity1's ID,
when i change the test1Entity1's ID diffrent from test2Entity1 ,
the data is insert correctly to test1 table!

@mevivs
Collaborator
mevivs commented Mar 18, 2013

Please share your entity definition and CRUD to replicate this at our end

-Vivek

@zhouhero

[my table]
create columnfamily test1 (attraction_id uuid primary key, del_flg int);
create columnfamily test2 (give_id uuid primary key, item_id int);

[my entity]

@Entity
@Table(name = "test1", schema = XXXX)
@XmlRootElement(name = "Test1Entity")
public class Test1Entity implements Serializable {
private static final long serialVersionUID = 1L;
@Id
@Column(name = "attraction_id")
private UUID attractionId;
@Column(name = "del_flg")
private int delFlg;
...
}

@Entity
@Table(name = "test2", schema = XXXX)
@XmlRootElement(name = "Test2Entity")
public class Test2Entity implements Serializable {
private static final long serialVersionUID = 1L;
@Id
@Column(name = "give_id")
private UUID giveId;
@Column(name = "item_id")
private int itemId;
...
}

[my insert code]
// set batchsize= 5000 in persistence.xml
...
EntityManager em = entityManagerFactory.createEntityManager();

Test1Entity test1Entity = new Test1Entity();
test1Entity.setAttractionId(UUID.fromString("13816710-1dd2-11b2-0000-242d50cf1ff7"));
test1Entity.setDelFlg(3);
em.persist(test1Entity);

test1Entity = new Test1Entity();
test1Entity.setAttractionId(UUID.fromString("13818e20-1dd2-11b2-0000-242d50cf1ffc"));
test1Entity.setDelFlg(5);
em.persist(test1Entity);

Test2Entity test2Entity = new Test2Entity();
test2Entity.setGiveId(UUID.fromString("13816710-1dd2-11b2-0000-242d50cf1ff7")); test2Entity.setItemId(12);
em.persist(test2Entity);

em.flush();
em.close();

[the resut]
test1 table record num:1 // worng, 2 is right!
test2 table record num:1

@zhouhero

I think you can appear this bug by the upper code.
please try it.

@mevivs
Collaborator
mevivs commented Mar 18, 2013

Hi,
I confirm this is a bug and will get it fixed very soon.

-Vivek

@zhouhero

thanks you for your help,
when and how can i get the fixed version ?
I am using kundera 2.2.1 now(i will change to 2.4 very soon).

@mevivs
Collaborator
mevivs commented Mar 19, 2013

This may take couple of days(will be released with 2.5). However there is a workaround for this. If you insert test1 from one entity manager and test2 from another entity manager. I guess it should work.

-Vivek

@xamry
xamry commented Mar 26, 2013

This has been fixed and latest code has been pushed to trunk.

A new test case for batch insertion involving multiple entities has been added here:

https://github.com/impetus-opensource/Kundera/blob/trunk/kundera-cassandra/src/test/java/com/impetus/client/crud/batch/CassandraBatchProcessorMixedTest.java

Please test and let me know if you face any issue.

  • Amresh
@xamry
xamry commented Apr 18, 2013

Did you get a chance to verify this? Can we close this one?

@mevivs
Collaborator
mevivs commented Apr 29, 2013

Releasing with 2.5

-Vivek

@mevivs
Collaborator
mevivs commented Jul 9, 2013

Fixed and released with 2.5

@mevivs mevivs closed this Jul 9, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment