Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce messenger APIs to extract discord channels #2752

Merged
merged 1 commit into from
Aug 4, 2022

Conversation

0x-r4bbit
Copy link
Member

@0x-r4bbit 0x-r4bbit commented Jul 13, 2022

As part of the new Discord <-> Status Community Import functionality,
we're adding an API that extracts all discord categories and channels
from a previously exported discord export file.

These APIs can be used in clients to show the user what categories and
channels will be imported later on.

There are two APIs:

  1. Messenger.ExtractDiscordCategoriesAndChannels(filesToimport []string) (*MessengerResponse, map[string]*discord.ImportError)

    This takes a list of exported discord export (JSON) files (typically one per
    channel), reads them, and extracts the categories and channels into
    dedicated data structures ([]DiscordChannel and []DiscordCategory)

    It also returns the oldest message timestamp found in all extracted
    channels.

    The API is synchronous and returns the extracted data as
    a *MessengerResponse. This allows to make the API available
    status-go's RPC interface.

    The error case is a map[string]*discord.ImportError where each key
    is a file path of a JSON file that we tried to extract data from, and
    the value a discord.ImportError which holds an error message and an
    error code, allowing for distinguishing between "critical" errors and
    "non-critical" errors.

  2. Messenger.RequestExtractDiscordCategoriesAndChannels(filesToImport []string)

    This is the asynchronous counterpart to
    ExtractDiscordCategoriesAndChannels. The reason this API has been
    added is because discord servers can have a lot of message and
    channel data, which causes ExtractDiscordCategoriesAndChannels to
    block the thread for too long, making apps potentially feel like they
    are stuck.

    This API runs inside a go routine, eventually calls
    ExtractDiscordCategoriesAndChannels, and then emits a newly
    introduced DiscordCategoriesAndChannelsExtractedSignal that clients
    can react to.

    Failure of extraction has to be determined by the
    discord.ImportErrors emitted by the signal.

A note about exported discord history files

We expect users to export their discord histories via the
DiscordChatExporter
tool. The tool allows to export the data in different formats, such as
JSON, HTML and CSV.

We expect users to have their data exported as JSON.

@ghost
Copy link

ghost commented Jul 13, 2022

Pull Request Checklist

  • Have you updated the documentation, if impacted (e.g. docs.status.im)?
  • Have you tested changes with mobile?
  • Have you tested changes with desktop?

type DiscordExportedChannel struct {
Channel DiscordChannel `json:"channel"`
Messages []DiscordMessage `json:"messages"`
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chances are, these types will be extended with additional properties as soon as I start working on converting discord messages to Waku messages.

}()
}

func (m *Messenger) ExtractDiscordChannelsAndCategories(filesToImport []string) (*MessengerResponse, error) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function I'll probably move into protocol/discord_importer

@status-im-auto
Copy link
Member

status-im-auto commented Jul 13, 2022

Jenkins Builds

Click to see older builds (34)
Commit #️⃣ Finished (UTC) Duration Platform Result
✔️ 77816be #1 2022-07-13 09:49:48 ~3 min linux 📦zip
✔️ 77816be #1 2022-07-13 09:50:52 ~4 min ios 📦zip
✔️ 77816be #1 2022-07-13 09:51:09 ~5 min android 📦aar
✔️ d439b19 #2 2022-07-13 10:33:31 ~2 min ios 📦zip
✔️ d439b19 #2 2022-07-13 10:34:41 ~3 min android 📦aar
✔️ d439b19 #2 2022-07-13 10:38:22 ~7 min linux 📦zip
✔️ 18e9d7b #3 2022-07-13 11:30:09 ~1 min linux 📦zip
✔️ 18e9d7b #3 2022-07-13 11:30:47 ~2 min ios 📦zip
✔️ 18e9d7b #3 2022-07-13 11:34:49 ~6 min android 📦aar
✔️ 2cbe942 #4 2022-07-19 09:40:05 ~2 min linux 📦zip
✔️ 2cbe942 #4 2022-07-19 09:42:30 ~4 min android 📦aar
✔️ 2cbe942 #4 2022-07-19 09:43:42 ~6 min ios 📦zip
e10cb66 #5 2022-07-19 10:49:38 ~48 sec linux 📄log
e10cb66 #5 2022-07-19 10:49:55 ~1 min android 📄log
e10cb66 #5 2022-07-19 10:52:32 ~3 min ios 📄log
✔️ 7cc7adb #6 2022-07-19 10:51:49 ~2 min linux 📦zip
✔️ 7cc7adb #6 2022-07-19 10:53:36 ~3 min android 📦aar
✔️ 7cc7adb #7 2022-07-19 10:53:38 ~1 min linux 📦zip
✔️ 7cc7adb #6 2022-07-19 10:55:42 ~3 min ios 📦zip
✔️ 9ef4407 #8 2022-07-19 12:04:39 ~1 min linux 📦zip
✔️ 9ef4407 #7 2022-07-19 12:05:18 ~2 min ios 📦zip
✔️ 9ef4407 #7 2022-07-19 12:06:29 ~3 min android 📦aar
✔️ efa25df #9 2022-07-19 12:43:46 ~1 min linux 📦zip
✔️ efa25df #8 2022-07-19 12:44:29 ~2 min ios 📦zip
✔️ efa25df #8 2022-07-19 12:45:26 ~3 min android 📦aar
✔️ 7de6cd2 #10 2022-07-21 19:14:33 ~1 min linux 📦zip
✔️ 7de6cd2 #9 2022-07-21 19:14:53 ~2 min ios 📦zip
✔️ 7de6cd2 #9 2022-07-21 19:16:46 ~3 min android 📦aar
✔️ 7de6cd2 #10 2022-07-26 08:26:49 ~2 min ios 📦zip
✔️ 7de6cd2 #11 2022-07-26 08:28:17 ~4 min linux 📦zip
✔️ 7de6cd2 #10 2022-07-26 08:28:44 ~4 min android 📦aar
✔️ 882dcdb #12 2022-08-02 09:04:22 ~2 min linux 📦zip
✔️ 882dcdb #11 2022-08-02 09:06:00 ~4 min ios 📦zip
✔️ 882dcdb #11 2022-08-02 09:06:19 ~4 min android 📦aar
Commit #️⃣ Finished (UTC) Duration Platform Result
✔️ ae7a62c #13 2022-08-02 09:06:15 ~1 min linux 📦zip
✔️ ae7a62c #12 2022-08-02 09:08:22 ~2 min ios 📦zip
✔️ ae7a62c #12 2022-08-02 09:09:41 ~3 min android 📦aar
✔️ 0c83fdb #14 2022-08-04 11:58:19 ~2 min linux 📦zip
✔️ 0c83fdb #13 2022-08-04 11:59:03 ~3 min ios 📦zip
✔️ 0c83fdb #13 2022-08-04 12:00:01 ~4 min android 📦aar


func (m *Messenger) ExtractDiscordChannelsAndCategories(filesToImport []string) (*MessengerResponse, error) {

response := &MessengerResponse{}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically, this function doesn't have to return a MessengerResponse. It just happens to be the case because I started out with a synchronous RPC API and this was the fastest way to get data across the wire.

Eventually I've added an asynchronous option, because it might take too long to do this synchronously.
So we could decide to drop the synchronous RPC API altogether, which means we could remove the newly introduced DiscordCategories, DiscordChannels and DiscordOldestMessageTimestamp properties from MessengerResponse.

@cammellos @Samyoul let me know what you think!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, it's simpler if API's are synchronous, and if you need, you wrap them in an asynchronous layer through the client (that's what we do in status-react) or provide an Async endpoint, so I would use async endpoint sparingly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you wrap them in an asynchronous layer through the client (that's what we do in status-react)

I agree... The async wrapper story in desktop is unfortunately far from ideal, so it'd be great if we could have that Async endpoint in status-go.

Would also rename it from Request* to *Async.

@0x-r4bbit 0x-r4bbit mentioned this pull request Jul 13, 2022
56 tasks
@0x-r4bbit 0x-r4bbit force-pushed the feat/discord-import branch 2 times, most recently from d439b19 to 18e9d7b Compare July 13, 2022 11:28
@0x-r4bbit
Copy link
Member Author

Added a test for ExtractDiscordCategoriesAndChannels API

0x-r4bbit added a commit to status-im/status-desktop that referenced this pull request Jul 13, 2022
This is a work in progress but here's what works:

- When creating a community, users can choose to "import a discord
community"
- In the "create community" flow there a few more steps related to
importing discord histories
- The first step is to choose files to import
- The next step is to choose the channels and categories to import

This needs status-im/status-go#2752
This also needs: status-im/StatusQ#770
And: status-im/StatusQ#771

**There are no designs for this atm so everything you see is based on
common sense, but subject to change**

Feel free to leave early feedback.
@0x-r4bbit 0x-r4bbit force-pushed the feat/discord-import branch 2 times, most recently from 2cbe942 to e10cb66 Compare July 19, 2022 10:48
@0x-r4bbit 0x-r4bbit changed the base branch from develop to feat/messenger-flag July 19, 2022 10:48
@0x-r4bbit 0x-r4bbit force-pushed the feat/discord-import branch from e10cb66 to 7cc7adb Compare July 19, 2022 10:49
oldestMessageTime = int(discordExportedData.Messages[0].Timestamp)
}
}
return discordCategories, discordChannels, oldestMessageTime, nil
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've extracted this logic into a more generic function because we need it in the next iteration where we perform the actual import (which also receives the files to import from and needs to extract the data again as well [although it might just be a subset])

@0x-r4bbit 0x-r4bbit force-pushed the feat/discord-import branch 2 times, most recently from 9ef4407 to efa25df Compare July 19, 2022 12:41
Comment on lines 57 to 72
message DiscordMessage {
string id = 1;
string type = 2;
string timestamp = 3;
string timestamp_edited = 4;
string content = 5;
DiscordMessageAuthor author = 6;
}

message DiscordMessageAuthor {
string id = 1;
string name = 2;
string discriminator = 3;
string nickname = 4;
string avatar_url = 5;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When are these protobufs used for protobuf related things?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already answered offline, but for transparency's sake I'll echo my response here:

As part of this PR they indeed only serve as structs. However, later on, when a new ChatMessage type for DiscordMessage is introduce, this data has to live in a protobuf because that's the type we rely on to save messages in the database.

I started out in fact with just simple structs but then had to move them in protobuf to be able to store data.

@0x-r4bbit 0x-r4bbit force-pushed the feat/messenger-flag branch 2 times, most recently from 1afe36f to 4db19f4 Compare July 21, 2022 18:55
@0x-r4bbit 0x-r4bbit force-pushed the feat/discord-import branch from efa25df to 7de6cd2 Compare July 21, 2022 19:12
@0x-r4bbit
Copy link
Member Author

Pinging @qoqobolo this as well is ready for mobile testing! :)

@qoqobolo qoqobolo self-assigned this Jul 22, 2022
@anastasiyaig anastasiyaig added the Tested: desktop checked for regression on desktop client label Jul 26, 2022
Base automatically changed from feat/messenger-flag to develop July 26, 2022 08:23
0x-r4bbit added a commit to status-im/status-desktop that referenced this pull request Jul 26, 2022
This is a work in progress but here's what works:

- When creating a community, users can choose to "import a discord
community"
- In the "create community" flow there a few more steps related to
importing discord histories
- The first step is to choose files to import
- The next step is to choose the channels and categories to import

This needs status-im/status-go#2752
This also needs: status-im/StatusQ#770
And: status-im/StatusQ#771

**There are no designs for this atm so everything you see is based on
common sense, but subject to change**

Feel free to leave early feedback.
Copy link
Member

@richard-ramos richard-ramos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@0x-r4bbit
Copy link
Member Author

Holding back on merging this one. I'll probably revise it a little to account for what is required by latest designs.

Copy link
Member

@Samyoul Samyoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your answers, this all looks great. Another nice PR.

@0x-r4bbit 0x-r4bbit force-pushed the feat/discord-import branch 3 times, most recently from 053b02d to ae7a62c Compare August 2, 2022 09:04
// has no messages, or is not parsable.
Code uint `json:"code"`
Message string `json:"message"`
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Samyoul This now introduces this new ImportError type that we've discussed offline.
Anything you would change here?

@@ -1778,3 +1782,107 @@ func (m *Messenger) SyncCommunitySettings(ctx context.Context, settings *communi
chat.LastClockValue = clock
return m.saveChat(chat)
}

func (m *Messenger) ExtractDiscordDataFromImportFiles(filesToImport []string) (*discord.ExtractedData, map[string]*discord.ImportError) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Samyoul Notice that I've changed the signature to return a map[string]*discord.ImportError

This is so that we know which filePath caused which errors (and can be quickly looked-up O(1) instead of iterating over a list of errors in the client.

return extractedData, errors
}

func (m *Messenger) ExtractDiscordChannelsAndCategories(filesToImport []string) (*MessengerResponse, map[string]*discord.ImportError) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the underlying ExtractDiscordDataFromImportFiles API reutrns a map[string]*discord.ImportError, this API will do so as well, which is then also bubbles up to the RPC API.

As discussed offline, this works and should be fine. If you have any more thoughts or preferences here in terms of changes, please let me know

response.DiscordChannels,
int64(response.DiscordOldestMessageTimestamp),
errors)
}()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another change here:

Previously, this would emit s ExtractDiscordCategoriesAndChannelsFailed signal in case there was an error.
Now there can be 0 or n errors and it's semantically okay if only a subset of import files cause an error.

So instead of emitting the failure signal, we now only emit the DiscordCategoriesAndChannelsExtracted signal with data that can include 0 or n errors. It's up to the client to decide how much of a failure that is :D

@0x-r4bbit
Copy link
Member Author

@qoqobolo sorry to ping you again here. I've done some changes to this PR (also with new tests) and while it's not super invasive, I still think we should run this through your battery of tests once more.


errors := map[string]*discord.ImportError{}

for _, fileToImport := range filesToImport {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and one thought here:

I think we can optimize this even further and spin up goroutines for each file here.
By running this concurrently we should see an increase in speed.

Would do this in another iteration though. I've run this the way it is with many discord channel files and it only took a few seconds at max already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a great idea and this should probably be done in another PR. Concurrency can introduce a bunch of issues, so this is fine for now.

@qoqobolo
Copy link

qoqobolo commented Aug 2, 2022

@PascalPrecht sure, no problem!
On it.

@qoqobolo
Copy link

qoqobolo commented Aug 2, 2022

@PascalPrecht no issues detected by e2e.

Not sure if it needs to be re-reviewed / re-approved, so not moving it to the Merge column for now.
Feel free to do this when PR is ready.

@Samyoul
Copy link
Member

Samyoul commented Aug 3, 2022

@PascalPrecht I've gone through the changes and they all look fine to me. The only potential issue, but not on my side, is a potential inconsistency with other API methods returning only a single error as a 2nd return parameter, but I think this is fine. We pass multiple inputs via the API and receive multiple outputs, that's fine to me. we want the importing to be fast and robust.

As part of the new Discord <-> Status Community Import functionality,
we're adding an API that extracts all discord categories and channels
from a previously exported discord export file.

These APIs can be used in clients to show the user what categories and
channels will be imported later on.

There are two APIs:

1. `Messenger.ExtractDiscordCategoriesAndChannels(filesToimport
   []string) (*MessengerResponse, map[string]*discord.ImportError)`

   This takes a list of exported discord export (JSON) files (typically one per
   channel), reads them, and extracts the categories and channels into
   dedicated data structures (`[]DiscordChannel` and `[]DiscordCategory`)

   It also returns the oldest message timestamp found in all extracted
   channels.

   The API is synchronous and returns the extracted data as
   a `*MessengerResponse`. This allows to make the API available
   status-go's RPC interface.

   The error case is a `map[string]*discord.ImportError` where each key
   is a file path of a JSON file that we tried to extract data from, and
   the value a `discord.ImportError` which holds an error message and an
   error code, allowing for distinguishing between "critical" errors and
   "non-critical" errors.

2. `Messenger.RequestExtractDiscordCategoriesAndChannels(filesToImport
   []string)`

   This is the asynchronous counterpart to
   `ExtractDiscordCategoriesAndChannels`. The reason this API has been
   added is because discord servers can have a lot of message and
   channel data, which causes `ExtractDiscordCategoriesAndChannels` to
   block the thread for too long, making apps potentially feel like they
   are stuck.

   This API runs inside a go routine, eventually calls
   `ExtractDiscordCategoriesAndChannels`, and then emits a newly
   introduced `DiscordCategoriesAndChannelsExtractedSignal` that clients
   can react to.

   Failure of extraction has to be determined by the
   `discord.ImportErrors` emitted by the signal.

**A note about exported discord history files**

We expect users to export their discord histories via the
[DiscordChatExporter](https://github.com/Tyrrrz/DiscordChatExporter/wiki/GUI%2C-CLI-and-Formats-explained#exportguild)
tool. The tool allows to export the data in different formats, such as
JSON, HTML and CSV.

We expect users to have their data exported as JSON.

Closes: status-im/status-desktop#6690
@0x-r4bbit 0x-r4bbit force-pushed the feat/discord-import branch from ae7a62c to 0c83fdb Compare August 4, 2022 11:55
@0x-r4bbit
Copy link
Member Author

Thanks @Samyoul for taking the time and giving your approval. In the absolute worst case we can change this to an []*discord.ImportError later on. But having a map[string]*discordImportError makes it easier and faster to locate relevant errors later int he clients.

I'll merge this once CI is green

@0x-r4bbit 0x-r4bbit merged commit 9c568c5 into develop Aug 4, 2022
@0x-r4bbit 0x-r4bbit deleted the feat/discord-import branch August 4, 2022 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Tested: desktop checked for regression on desktop client Tested: mobile checked for regression on mobile client
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants