Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch spec from docker to locally run azurite instance #207

Merged
merged 9 commits into from
May 27, 2022

Conversation

Arkatufus
Copy link
Contributor

@Arkatufus Arkatufus commented Apr 18, 2022

  • Uses locally installed azurite using npm
  • Upgrade Windows.Azure.Cosmos.Table package to Azure.Data.Tables

NOTE:
Will fail until akkadotnet/akka.net#5849 is merged and released

@Arkatufus Arkatufus self-assigned this May 20, 2022
@Arkatufus Arkatufus added the dependencies Pull requests that update a dependency file label May 20, 2022

batchItems = batchItems.Add(newItem);
batchItems.Add(new TableTransactionAction(TableTransactionActionType.Add, newItem.WriteEntity()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New SDK way of building a batch data CRUD, everything is encapsulated inside a transaction-like batch

filter: $"PartitionKey eq '{AllPersistenceIdsEntry.PartitionKeyValue}'",
maxPerPage: maxPerPage,
@select: new[] { "RowKey" },
cancellationToken: cancellationToken
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest change in the SDK, it doesn't use an awkward expression builder anymore, it actually takes an OData query directly. We could also use a strongly typed Expression<Func<bool>> linq-like builder, but that turns out to be problematic for building string comparison queries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little cheat-sheet on OData:

  • eq is equals
  • ne is not equals
  • lt is less than
  • le is less than or equal
  • gt is greater than
  • ge is greater than or equal

$"RowKey le '{toSequenceNr.ToJournalRowKey()}'",
maxPerPage: maxPerPage,
@select: null,
cancellationToken: cancellationToken);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing this and the old way of building queries, you can see that this is a lot more clear and concise.

}
}

private async Task<bool> IsTableExist(string name, CancellationToken token)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Table.Exists method was removed by design in the new SDK, so we need to roll our own here.


namespace Akka.Persistence.Azure.Journal
{
internal sealed class HighestSequenceNrEntry
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is a duplicate of Akka.Persistence.Azure.TableEntities.HighestSequenceNrEntry, deleting it so that it doesn't cause any bug in the future.

{
private const string ManifestKeyName = "manifest";
public const string PartitionKeyValue = "allPersistenceIdsIdx";

// In order to use this in a TableQuery a parameterless constructor is required
public AllPersistenceIdsEntry()
public AllPersistenceIdsEntry(TableEntity entity)
Copy link
Contributor Author

@Arkatufus Arkatufus May 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of relying on the SDK deserialization code, we will use the EventHubs generic TableEntity serialization and populate our own data from it.
By doing this, we can convert all of the table entity data model back to immutable classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean Table Storage generic serialization, but your point is well taken.

public string[] Tags { get; set; }

public DateTimeOffset Timestamp { get; set; }
public string[] Tags { get; }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This property is the main reason why I opted not to use the built-in serializer. In the strongly typed json based serializer, this would be saved as a JSON array, while we traditionally have been saving it as a CSV. Trying to coerce the internal JSON serializer to serialize this as CSV would be a mess.

@Aaronontheweb Aaronontheweb linked an issue May 25, 2022 that may be closed by this pull request
Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite done reviewing yet, but left some thoughts / questions.

Very interested in seeing if we can get #129 (comment) resolved as well.

build-system/azure-pipeline.template.yaml Show resolved Hide resolved
Initialize();
}

[WindowsFact(SkipUnixReason = "Batch delete is not supported by Azurite in Linux")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


public static Config AzureConfig(Func<string, Config> configTemplate)
{
var connString = Environment.GetEnvironmentVariable("AZURE_CONNECTION_STR");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

foreach (var r in eventTagsResults)
_log.Debug("Azure table storage wrote entity [{0}] with status code [{1}]", r.Etag, r.HttpStatusCode);
foreach (var r in eventTagsResponse.Value)
_log.Debug("Azure table storage wrote entity with status code [{1}]", r.Status);

if (HasTagSubscribers && taggedEntries.Count != 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we get rid of the tag subscribers stuff inside the write loop? Seems out of place given that we still have to poll for changes on the table to detect changes that happen exclusively on other nodes in the network. This is something that got copied and pasted into a lot of journals but it was only ever really appropriate for SQLite / in-memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should, should we open another issue and PR for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should

@@ -453,7 +411,7 @@ protected override void PreStart()
*
* Either everything fails or everything succeeds is the idea I guess.
*/
return exceptions.Any(ex => ex != null) ? exceptions : null;
return exceptions.Any(ex => ex != null) ? exceptions.ToImmutableList() : null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch

@Arkatufus
Copy link
Contributor Author

To compute the highest sequence number, is that a constant time operation (peeking the top of the partition) or do we have to scan a portion of it? Been a while since I've looked at this code.

The query will grab all persistence id entries filtered to their last row key, then the code will compare all of those entries for the highest sequence number. In theory, this will be slow for journal tables with a lot of persistence ids.

@Arkatufus
Copy link
Contributor Author

That's just to shave off some latency?

Yes, the hack shaves some query over HTTP latency for the next page in the query.

Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

var returnValue = new TableQuery<HighestSequenceNrEntry>().Where(filter);

return returnValue;
return Table.QueryAsync<TableEntity>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a constant time query since it looks up a specific table entry for each entity that stores the highestSequenceNr

get
{
if (_tableStorage_DoNotUseDirectly == null)
throw new Exception("Table storage has not been initialized yet. PreStart() has not been invoked");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

foreach (var r in eventTagsResults)
_log.Debug("Azure table storage wrote entity [{0}] with status code [{1}]", r.Etag, r.HttpStatusCode);
foreach (var r in eventTagsResponse.Value)
_log.Debug("Azure table storage wrote entity with status code [{1}]", r.Status);

if (HasTagSubscribers && taggedEntries.Count != 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should

var encodedKey = PartitionKeyEscapeHelper.Escape(x.Key);
allPersistenceIdsBatch.InsertOrReplace(new AllPersistenceIdsEntry(encodedKey));
});
allPersistenceIdsBatch.Add(new TableTransactionAction(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add a comment here explaining that this is updating a specific object that stores the highest seqNo used in recovery

{
private const string ManifestKeyName = "manifest";
public const string PartitionKeyValue = "allPersistenceIdsIdx";

// In order to use this in a TableQuery a parameterless constructor is required
public AllPersistenceIdsEntry()
public AllPersistenceIdsEntry(TableEntity entity)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean Table Storage generic serialization, but your point is well taken.

Timestamp = Timestamp,
[PayloadKeyName] = Payload,
[SeqNoKeyName] = SeqNo,
[TagsKeyName] = string.Join(";", Tags),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks the same as before, good job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate test suite to Azurite instead of old azure emulator image
2 participants