Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with Bulk Inserts And Unique Constraint Violations #3847

Closed
msullivan opened this issue May 10, 2022 Discussed in #3839 · 0 comments · Fixed by #3848
Closed

Problem with Bulk Inserts And Unique Constraint Violations #3847

msullivan opened this issue May 10, 2022 Discussed in #3839 · 0 comments · Fixed by #3848
Assignees

Comments

@msullivan
Copy link
Member

Discussed in #3839

Originally posted by ngriffiths13 May 7, 2022
I am having some issues trying to do a bulk insert of my data using the built in for loop as shown here. In my schema I have an exclusivity constraint on one of the properties of my types. If I try to bulk insert any data into my DB, I get a constraint violation error thrown even though i am very clearly not violating the constraints. I am wondering if I am doing something wrong on my insert?

Here is my schema:

module default {
    type Tweet {
        annotation title := "Tweet object from tweepy.";
        annotation description := "All scraped data from a tweet from Twitter API 2.0.";

        required property tweet_id -> int64 {
            constraint exclusive;
        }
        required property tweet_text -> str {
            annotation description := "The body of text from a tweet.";
        }
        required property created_at -> datetime;
        required link account -> Account {
            on target delete delete source;
        }
        link traffic_report := .<source_tweet[is TrafficReport];
        link metrics -> TweetMetrics
    }
    type TweetMetrics {
        annotation description:= 'Publicly available tweet metrics.';
        property retweet_count -> int64 {
            constraint min_value(0)
        }
        property reply_count -> int64 {
            constraint min_value(0)
        }
        property like_count -> int64 {
            constraint min_value(0)
        }
        property quote_count -> int64 {
            constraint min_value(0)
        }
    }
    type Account {
        required property account_id -> int64{
            constraint exclusive;
        }
        required property name -> str{
            constraint exclusive;
        }
        required property username -> str{
            constraint exclusive;
        }
        multi link tweets := .<account[is Tweet]
    }
  }

And here is my insert query:

  WITH 
      raw_data := <json>$data,
  for data in json_array_unpack(raw_data) union(
      INSERT Tweet {
          account := (SELECT Account filter .username = <str>data['username']),
          metrics := (INSERT TweetMetrics {
                          like_count := <int64>data['like_count'],
                          quote_count := <int64>data['quote_count'],
                          reply_count := <int64>data['reply_count'],
                          retweet_count := <int64>data['retweet_count'],
                      }
                  ),
          tweet_id := <int64>data['tweet_id'],
          tweet_text := <str>data['text'],
          created_at := <datetime>data['created_at']
      }
      UNLESS conflict on .tweet_id
  );

From an empty database, I insert one account, elonmusk. Then trying to insert tweet data I get the error. Here is what my data looks like (just 2 entries but enough to recreate the error):

'[{"username": "elonmusk", "tweet_id": 1523046265097158656, "text": "@SawyerMerritt Yes! Stop the outrageous false ESG assessments, where Tesla gets a bad grade, but an oil company can get a good grade. Total gaming of the system!", "created_at": "2022-05-07 21:05:01+00:00", "like_count": 14203, "quote_count": 83, "reply_count": 758, "retweet_count": 1499}, {"username": "elonmusk", "tweet_id": 1523045544536723456, "text": "@WholeMarsBlog At risk of stating the obvious, unless something changes to cause the birth rate to exceed the death rate, Japan will eventually cease to exist. This would be a great loss for the world.", "created_at": "2022-05-07 21:02:09+00:00", "like_count": 38529, "quote_count": 768, "reply_count": 3315, "retweet_count": 4140}]'

I insert them using the python client library like this after connecting where query is the above query and tweets is the json.
conn.query(query, data=tweets)

I have found if I run this query with only 1 entry it works. But 2 or more fails. If i do it with 2 but 1 was already in the DB, then it succeeds. It fails with this error:

~/personal/traffic-project-v2/traffic_project/etl/data_utils.py in import_user_tweet(client, tweet)
     45         );
     46         """
---> 47     client.query(query, data=json.dumps(tweet))
     48 
     49 

~/personal/traffic-project-v2/.venv/lib/python3.9/site-packages/edgedb/abstract.py in query(self, query, *args, **kwargs)
     95 
     96     def query(self, query: str, *args, **kwargs) -> datatypes.Set:
---> 97         return self._query(QueryContext(
     98             query=QueryWithArgs(query, args, kwargs),
     99             cache=self._get_query_cache(),

~/personal/traffic-project-v2/.venv/lib/python3.9/site-packages/edgedb/blocking_client.py in _query(self, query_context)
    314 
    315     def _query(self, query_context: abstract.QueryContext):
--> 316         return self._iter_coroutine(super()._query(query_context))
    317 
    318     def execute(self, query: str) -> None:

~/personal/traffic-project-v2/.venv/lib/python3.9/site-packages/edgedb/blocking_client.py in _iter_coroutine(self, coro)
    303     def _iter_coroutine(self, coro):
    304         try:
--> 305             coro.send(None)
    306         except StopIteration as ex:
    307             if ex.args:

~/personal/traffic-project-v2/.venv/lib/python3.9/site-packages/edgedb/base_client.py in _query(self, query_context)
    681         con = await self._impl.acquire()
    682         try:
--> 683             result, _ = await con.raw_query(query_context)
    684             return result
    685         finally:

~/personal/traffic-project-v2/.venv/lib/python3.9/site-packages/edgedb/base_client.py in raw_query(self, query_context)
    212                     raise
    213                 if not e.has_tag(errors.SHOULD_RETRY):
--> 214                     raise e
    215                 if capabilities is None:
    216                     cache_item = query_context.cache.query_cache.get(

~/personal/traffic-project-v2/.venv/lib/python3.9/site-packages/edgedb/base_client.py in raw_query(self, query_context)
    197                 if reconnect:
    198                     await self.connect(single_attempt=True)
--> 199                 return await self._protocol.execute_anonymous(
    200                     query=query_context.query.query,
    201                     args=query_context.query.args,

~/personal/traffic-project-v2/.venv/lib/python3.9/site-packages/edgedb/protocol/protocol.pyx in execute_anonymous()

~/personal/traffic-project-v2/.venv/lib/python3.9/site-packages/edgedb/protocol/protocol.pyx in _execute()

ConstraintViolationError: tweet_id violates exclusivity constraint

Sorry for all of the code snippets, but any help in solving this would be much appreciated. Not sure if this is a bug or just user error on my part?

@msullivan msullivan self-assigned this May 10, 2022
msullivan added a commit that referenced this issue May 10, 2022
Since the relevant rel wasn't in the hierarchy, the search for the
enclosing volatility ref wasn't succeeding, and when nonconflict CTEs
were used in a for loop, sometimes things could get
duplicated. (Depending on the vagaries of how pg orders the loops, I
think).

Fixes #3847.
msullivan added a commit that referenced this issue May 10, 2022
Since the relevant rel wasn't in the hierarchy, the search for the
enclosing volatility ref wasn't succeeding, and when nonconflict CTEs
were used in a for loop, sometimes things could get
duplicated. (Depending on the vagaries of how pg orders the loops, I
think).

Fixes #3847.
msullivan added a commit that referenced this issue May 10, 2022
Since the relevant rel wasn't in the hierarchy, the search for the
enclosing volatility ref wasn't succeeding, and when nonconflict CTEs
were used in a for loop, sometimes things could get
duplicated. (Depending on the vagaries of how pg orders the loops, I
think).

Fixes #3847.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant