Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After process killed sequence_number start from beginning #78

Closed
msrdic opened this issue Nov 26, 2013 · 3 comments
Closed

After process killed sequence_number start from beginning #78

msrdic opened this issue Nov 26, 2013 · 3 comments

Comments

@msrdic
Copy link
Contributor

msrdic commented Nov 26, 2013

I'm noticing some strange behaviour after I force kill (and after that automatically restart) influxdb process. I'm using the latest brew version on MacOS Mavericks.

I'm working on a Java client library for influxdb and I'm trying to write some fairly generic data (I'm writing this data with time values set to 0). Here's the scenario:

  • write data to series_1: points get unique sequence_number values from 1 to N
  • write some more data to series_2: points get unique sequence_number values from N+1 to M

Now I force kill the process and restart it.

  • write some data to series_2: points get sequence_number values from 1 to N, so they're still unique in the series_2 (note that same sequence_number values exist in series_1)

Force kill it again and restart. Write some data to series_2 and here's where it gets interesting:

  • if I try to write 4 data points with default values for sequence_number and time (0 for both in this case, since the version I use doesn't support writing null values), the already existing points in the series_2 get updated, namely points with sequence_number values 1, 2, 3 and 4. These data points already had time set to 0 in the first write, so they get updated solely on the criteria of time value equality (at least that's what it looks like to me). Now I change the default value for time and set it to the current time. Writing another 4 entries to series_2 makes the duplicate entries in the series_2, but with different time value.

Here's the sample data:

test> select  name, pay from gringotts_devs_5;
┌───────────────┬─────────────────┬────────┬──────┐
│ time          │ sequence_number │ name   │ pay  │
├───────────────┼─────────────────┼────────┼──────┤
│ 1385463465421 │ 16              │ Deda   │ 6666 │
│ 1385463465421 │ 15              │ Dejan  │ 7777 │
│ 1385463465421 │ 14              │ Mladen │ 7777 │
│ 1385463465421 │ 13              │ Sasa   │ 7777 │
│               │ 24              │ Deda   │ 6666 │
│               │ 23              │ Dejan  │ 6666 │
│               │ 22              │ Mladen │ 6666 │
│               │ 21              │ Sasa   │ 6666 │
│               │ 20              │ Deda   │ 6666 │
│               │ 19              │ Dejan  │ 6666 │
│               │ 18              │ Mladen │ 6666 │
│               │ 17              │ Sasa   │ 6666 │
│               │ 16              │ Deda   │ 6666 │
│               │ 15              │ Dejan  │ 6666 │
│               │ 14              │ Mladen │ 6666 │
│               │ 13              │ Sasa   │ 6666 │
│               │ 12              │ Deda   │ 6666 │
│               │ 11              │ Dejan  │ 6666 │
│               │ 10              │ Mladen │ 6666 │
│               │ 9               │ Sasa   │ 6666 │
│               │ 8               │ Deda   │ 6666 │
│               │ 7               │ Dejan  │ 7777 │
│               │ 6               │ Mladen │ 7777 │
│               │ 5               │ Sasa   │ 7777 │
│               │ 4               │ Deda   │ 6666 │
│               │ 3               │ Dejan  │ 6666 │
│               │ 2               │ Mladen │ 6666 │
│               │ 1               │ Sasa   │ 6666 │
└───────────────┴─────────────────┴────────┴──────┘

Is this expected behaviour or a know bug regarding the uniqueness of sequence_number?

@pauldix
Copy link
Member

pauldix commented Nov 26, 2013

It's a known issue with sequence number. It's not currently being persisted
so it's not guaranteed unique across restarts. It's actually fixed in this
branch: #20, which we'll be
merging in early next week. In that branch the sequence number is
guaranteed to be unique across the cluster even with node restarts.

The behavior you're seeing of updating points if ones with a given time and
sequence number are there is expected behavior. Even though sequence
numbers are unique, points are uniquely identified by their database,
series, time, and sequence number. So if you post a point with a time and
sequence number, it'll update or insert depending on if it exists. I know
you're not sending sequence numbers, but because it's reusing old numbers
it's updating them.

We'll keep this issue open until we merge that PR in.

On Tue, Nov 26, 2013 at 6:47 AM, Mladen Srdic notifications@github.comwrote:

I'm noticing some strange behaviour after I force kill (and after that
automatically restart) influxdb process. I'm using the latest brew version
on MacOS Mavericks.

I'm working on a Java client library for influxdb and I'm trying to write
some fairly generic data (I'm writing this data with time values set to 0).
Here's the scenario:

  • write data to series_1: points get unique sequence_number values
    from 1 to N
  • write some more data to series_2: points get unique sequence_number
    values from N+1 to M

Now I force kill the process and restart it.

  • write some data to series_2: points get sequence_number values from
    1 to N, so they're still unique in the series_2 (note that same
    sequence_number values exist in series_1)

Force kill it again and restart. Write some data to series_2 and here's
where it gets interesting:

  • if I try to write 4 data points with default values for
    sequence_number and time (0 for both in this case, since the version I use
    doesn't support writing null values), the already existing points in the
    series_2 get updated, namely points with sequence_number values 1, 2, 3 and
  • These data points already had time set to 0 in the first write, so they
    get updated solely on the criteria of time value equality (at least that's
    what it looks like to me). Now I change the default value for time and set
    it to the current time. Writing another 4 entries to series_2 makes the
    duplicate entries in the series_2, but with different time value.

Here's the sample data:

test> select name, pay from gringotts_devs_5;
┌───────────────┬─────────────────┬────────┬──────┐
│ time │ sequence_number │ name │ pay │
├───────────────┼─────────────────┼────────┼──────┤
│ 1385463465421 │ 16 │ Deda │ 6666 │
│ 1385463465421 │ 15 │ Dejan │ 7777 │
│ 1385463465421 │ 14 │ Mladen │ 7777 │
│ 1385463465421 │ 13 │ Sasa │ 7777 │
│ │ 24 │ Deda │ 6666 │
│ │ 23 │ Dejan │ 6666 │
│ │ 22 │ Mladen │ 6666 │
│ │ 21 │ Sasa │ 6666 │
│ │ 20 │ Deda │ 6666 │
│ │ 19 │ Dejan │ 6666 │
│ │ 18 │ Mladen │ 6666 │
│ │ 17 │ Sasa │ 6666 │
│ │ 16 │ Deda │ 6666 │
│ │ 15 │ Dejan │ 6666 │
│ │ 14 │ Mladen │ 6666 │
│ │ 13 │ Sasa │ 6666 │
│ │ 12 │ Deda │ 6666 │
│ │ 11 │ Dejan │ 6666 │
│ │ 10 │ Mladen │ 6666 │
│ │ 9 │ Sasa │ 6666 │
│ │ 8 │ Deda │ 6666 │
│ │ 7 │ Dejan │ 7777 │
│ │ 6 │ Mladen │ 7777 │
│ │ 5 │ Sasa │ 7777 │
│ │ 4 │ Deda │ 6666 │
│ │ 3 │ Dejan │ 6666 │
│ │ 2 │ Mladen │ 6666 │
│ │ 1 │ Sasa │ 6666 │
└───────────────┴─────────────────┴────────┴──────┘

Is this expected behaviour or a know bug regarding the uniqueness of
sequence_number?


Reply to this email directly or view it on GitHubhttps://github.com//issues/78
.

@pauldix
Copy link
Member

pauldix commented Nov 26, 2013

btw, any chance you'll be making your java library available?

@msrdic
Copy link
Contributor Author

msrdic commented Nov 26, 2013

Thanks for the quick reply, I'll be watching for the merge next week.

Currently, the library mainly depends on internal libraries at my company, but I think I will be able to make it available as soon as it's completely functional (it's still in an early stage) and resolved of said dependencies.

@pauldix pauldix closed this as completed in ec93c7f Dec 3, 2013
jvshahid pushed a commit that referenced this issue Aug 12, 2014
Change binary encoding to protobuf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants