Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk inserting is not working #15

Open
th0th opened this issue Oct 12, 2019 · 22 comments
Open

Bulk inserting is not working #15

th0th opened this issue Oct 12, 2019 · 22 comments

Comments

@th0th
Copy link

th0th commented Oct 12, 2019

Hi, first of all, thank your making clickhouse-bulk 💐

I am running with this config

{
  "listen": ":8125",
  "flush_count": 10000,
  "flush_interval": 3000,
  "debug": true,
  "dump_dir": "dumps",
  "clickhouse": {
    "down_timeout": 300,
    "servers": [
      "http://127.0.0.1:8123"
    ]
  }
}

Shouldn't this config collect and insert incoming requests in every 3 seconds, in bulk? I am watching logs and seeing every insert I send via HTTP (I am doing insert by python's requests) being processed immediately as I send. What am I doing wrong?

@nikepan
Copy link
Owner

nikepan commented Oct 12, 2019

it's intresting. I will write test for it. Thanks.

@th0th
Copy link
Author

th0th commented Oct 12, 2019

Is there anything I can provide to help you reproduce? I tried both the docker image and binary on GNU/Linux (alpine). I also tried environment variable configuration on docker image. Result didn't change.

@nikepan
Copy link
Owner

nikepan commented Oct 15, 2019

Did you sent insert or select queries? if insert, to same table or different?

@th0th
Copy link
Author

th0th commented Oct 15, 2019

Hey @nikepan, my code is like this:

import requests


query = """
    INSERT INTO table
    (field1, field 2)
    VALUES
    ('{field1}', '{field2}');
    """.format(
        field1=field1,
        field2=field2,
    )

requests.post(
    url='http://clickhouse-bulk-url',
    params={
        'query': query,
    },
)

I have many workers on different servers/docker containers doing the same thing, my expectation was clickhouse-bulk to collect multiple incoming inserts in memory and then actually insert to the table in bulks.

@nikepan
Copy link
Owner

nikepan commented Oct 15, 2019

Yes, I catched bug. I dont know, how it works. I will publish new version soon. With all works fine and auto resend dumps. Thank you!

@th0th
Copy link
Author

th0th commented Oct 15, 2019

Thanks, looking forward to the new release! Do you have a date in mind?

@nikepan
Copy link
Owner

nikepan commented Oct 15, 2019

It's because you send multi line query. At now clickhouse-bulk supports only single line queries. I try to fix in next release

@th0th
Copy link
Author

th0th commented Oct 15, 2019

It's because you send multi line query. At now clickhouse-bulk supports only single line queries. I try to fix in next release

Oh interesting. Would it work if I just replace \n s with space then?

@nikepan
Copy link
Owner

nikepan commented Oct 15, 2019

Yes, i checked it:
query = "INSERT INTO table (field1, field 2) VALUES ('{field1}', '{field2}');".format(
field1='f1',
field2='f2',
)
⇨ http server started on [::]:8124
2019/10/15 21:52:40 send 3 rows to http://127.0.0.1:8123 of query=INSERT+INTO+table+%28field1%2C+field+2%29+VALUES
2019/10/15 21:52:42 Send ERROR 502: No working clickhouse server

@th0th
Copy link
Author

th0th commented Oct 15, 2019

Awesome! I can go with that workaround until you release the new version then, thanks a lot @nikepan!

@th0th
Copy link
Author

th0th commented Oct 15, 2019

I think I did hit another possible bug. I converted my multiline inserts to single-line ones.

And even I see this log

2019/10/15 21:22:08 send 11 rows to http://clickhouse:8123 of query=...

I still see a single row being inserted into the actual table.

@nikepan
Copy link
Owner

nikepan commented Oct 15, 2019

I need example of your request. Rows calculated by \n char

@th0th
Copy link
Author

th0th commented Oct 15, 2019

Here is the code:

import requests


query = "INSERT INTO data (id, date_time, item_id, is_successful, detail, time1, time2, time3, time4, time5, time6, time7) VALUES ('bf58a4af-0e56-4e07-91d9-bbf70917e0da', '2019-10-15 21:36:43', 10, 1, 'OK', 8, 49, 99, 99, 222, 0, 377);"

requests.post(
    url='http://clickhouse-bulk-url',
    params={
        'query': query,
    },
)

@th0th
Copy link
Author

th0th commented Oct 15, 2019

A testable example:

import requests


query = "INSERT INTO data (id, date_time, item_id, is_successful, detail, time1, time2, time3, time4, time5, time6, time7) VALUES ('bf58a4af-0e56-4e07-91d9-bbf70917e0da', '2019-10-15 21:36:43', 10, 1, 'OK', 8, 49, 99, 99, 222, 0, 377);"

for i in range(10):
  requests.post(
      url='http://clickhouse-bulk-url',
      params={
          'query': query,
      },
  )

Here is the log:

2019/10/15 21:45:28 send 10 rows to http://clickhouse:8123 of query=...

And I see 1 row inserted into the database.

@nikepan
Copy link
Owner

nikepan commented Oct 16, 2019

you need remove ";" at end of line. It bug too) Thanks for report

@nikepan
Copy link
Owner

nikepan commented Oct 16, 2019

clickhouse-bulk for speed parse strings simply and not support many variations)

@th0th
Copy link
Author

th0th commented Oct 16, 2019

No problem :) I will try without the semicolon and check again. Thanks.

@a1tus
Copy link

a1tus commented Feb 16, 2020

@nikepan, hello. It seems like we've faced this issue too.
Can you clarify, is it fixed in v1.2.*? It's released after your comments here but there's no info about it in changelog.

@nikepan
Copy link
Owner

nikepan commented Feb 20, 2020

Can you clarify, is it fixed in v1.2.*? It's released after your comments here but there's no info about it in changelog.

No, it does not fixed. I found some troubles for this changes. I will try fix it later, but not now.

@a1tus
Copy link

a1tus commented Feb 20, 2020

Ok, thanks, at least now we know status exactly :)

@rsurgiewicz
Copy link

rsurgiewicz commented May 2, 2020

Hello
Are those issues fixed in latest docker image?
I'm trying to use the tool but despite

CLICKHOUSE_FLUSH_INTERVAL=10000

I'm getting
send 1 row .... 99%
from time-to time
send 2 row .... 99%
My inserts goes from external system and it's using JSONEachRow to 3 tables.
Is this case supposed to work or shall I have a separate instance of bulk per each table?

Thanks!

@rsurgiewicz
Copy link

rsurgiewicz commented May 3, 2020

Ok nailed it!
There was an extra space before FORMAT keyword in my requests.
After removing it, batch inserts work!

Repository owner deleted a comment Dec 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@th0th @a1tus @nikepan @rsurgiewicz and others