-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: introduce messenger APIs to extract discord channels #2752
Conversation
Pull Request Checklist
|
protocol/discord_importer/types.go
Outdated
type DiscordExportedChannel struct { | ||
Channel DiscordChannel `json:"channel"` | ||
Messages []DiscordMessage `json:"messages"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chances are, these types will be extended with additional properties as soon as I start working on converting discord messages to Waku messages.
protocol/messenger_communities.go
Outdated
}() | ||
} | ||
|
||
func (m *Messenger) ExtractDiscordChannelsAndCategories(filesToImport []string) (*MessengerResponse, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function I'll probably move into protocol/discord_importer
Jenkins BuildsClick to see older builds (34)
|
protocol/messenger_communities.go
Outdated
|
||
func (m *Messenger) ExtractDiscordChannelsAndCategories(filesToImport []string) (*MessengerResponse, error) { | ||
|
||
response := &MessengerResponse{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, this function doesn't have to return a MessengerResponse
. It just happens to be the case because I started out with a synchronous RPC API and this was the fastest way to get data across the wire.
Eventually I've added an asynchronous option, because it might take too long to do this synchronously.
So we could decide to drop the synchronous RPC API altogether, which means we could remove the newly introduced DiscordCategories
, DiscordChannels
and DiscordOldestMessageTimestamp
properties from MessengerResponse
.
@cammellos @Samyoul let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, it's simpler if API's are synchronous, and if you need, you wrap them in an asynchronous layer through the client (that's what we do in status-react) or provide an Async
endpoint, so I would use async endpoint sparingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you wrap them in an asynchronous layer through the client (that's what we do in status-react)
I agree... The async wrapper story in desktop is unfortunately far from ideal, so it'd be great if we could have that Async
endpoint in status-go.
Would also rename it from Request*
to *Async
.
d439b19
to
18e9d7b
Compare
Added a test for |
This is a work in progress but here's what works: - When creating a community, users can choose to "import a discord community" - In the "create community" flow there a few more steps related to importing discord histories - The first step is to choose files to import - The next step is to choose the channels and categories to import This needs status-im/status-go#2752 This also needs: status-im/StatusQ#770 And: status-im/StatusQ#771 **There are no designs for this atm so everything you see is based on common sense, but subject to change** Feel free to leave early feedback.
2cbe942
to
e10cb66
Compare
e10cb66
to
7cc7adb
Compare
protocol/messenger_communities.go
Outdated
oldestMessageTime = int(discordExportedData.Messages[0].Timestamp) | ||
} | ||
} | ||
return discordCategories, discordChannels, oldestMessageTime, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've extracted this logic into a more generic function because we need it in the next iteration where we perform the actual import (which also receives the files to import from and needs to extract the data again as well [although it might just be a subset])
9ef4407
to
efa25df
Compare
protocol/protobuf/chat_message.proto
Outdated
message DiscordMessage { | ||
string id = 1; | ||
string type = 2; | ||
string timestamp = 3; | ||
string timestamp_edited = 4; | ||
string content = 5; | ||
DiscordMessageAuthor author = 6; | ||
} | ||
|
||
message DiscordMessageAuthor { | ||
string id = 1; | ||
string name = 2; | ||
string discriminator = 3; | ||
string nickname = 4; | ||
string avatar_url = 5; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When are these protobufs used for protobuf related things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already answered offline, but for transparency's sake I'll echo my response here:
As part of this PR they indeed only serve as structs. However, later on, when a new ChatMessage
type for DiscordMessage
is introduce, this data has to live in a protobuf because that's the type we rely on to save messages in the database.
I started out in fact with just simple structs but then had to move them in protobuf to be able to store data.
1afe36f
to
4db19f4
Compare
efa25df
to
7de6cd2
Compare
Pinging @qoqobolo this as well is ready for mobile testing! :) |
This is a work in progress but here's what works: - When creating a community, users can choose to "import a discord community" - In the "create community" flow there a few more steps related to importing discord histories - The first step is to choose files to import - The next step is to choose the channels and categories to import This needs status-im/status-go#2752 This also needs: status-im/StatusQ#770 And: status-im/StatusQ#771 **There are no designs for this atm so everything you see is based on common sense, but subject to change** Feel free to leave early feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
Holding back on merging this one. I'll probably revise it a little to account for what is required by latest designs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your answers, this all looks great. Another nice PR.
053b02d
to
ae7a62c
Compare
// has no messages, or is not parsable. | ||
Code uint `json:"code"` | ||
Message string `json:"message"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Samyoul This now introduces this new ImportError
type that we've discussed offline.
Anything you would change here?
@@ -1778,3 +1782,107 @@ func (m *Messenger) SyncCommunitySettings(ctx context.Context, settings *communi | |||
chat.LastClockValue = clock | |||
return m.saveChat(chat) | |||
} | |||
|
|||
func (m *Messenger) ExtractDiscordDataFromImportFiles(filesToImport []string) (*discord.ExtractedData, map[string]*discord.ImportError) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Samyoul Notice that I've changed the signature to return a map[string]*discord.ImportError
This is so that we know which filePath
caused which errors (and can be quickly looked-up O(1)
instead of iterating over a list of errors in the client.
return extractedData, errors | ||
} | ||
|
||
func (m *Messenger) ExtractDiscordChannelsAndCategories(filesToImport []string) (*MessengerResponse, map[string]*discord.ImportError) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the underlying ExtractDiscordDataFromImportFiles
API reutrns a map[string]*discord.ImportError
, this API will do so as well, which is then also bubbles up to the RPC API.
As discussed offline, this works and should be fine. If you have any more thoughts or preferences here in terms of changes, please let me know
response.DiscordChannels, | ||
int64(response.DiscordOldestMessageTimestamp), | ||
errors) | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another change here:
Previously, this would emit s ExtractDiscordCategoriesAndChannelsFailed
signal in case there was an error.
Now there can be 0
or n
errors and it's semantically okay if only a subset of import files cause an error.
So instead of emitting the failure signal, we now only emit the DiscordCategoriesAndChannelsExtracted
signal with data that can include 0
or n
errors. It's up to the client to decide how much of a failure that is :D
@qoqobolo sorry to ping you again here. I've done some changes to this PR (also with new tests) and while it's not super invasive, I still think we should run this through your battery of tests once more. |
|
||
errors := map[string]*discord.ImportError{} | ||
|
||
for _, fileToImport := range filesToImport { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and one thought here:
I think we can optimize this even further and spin up goroutines for each file here.
By running this concurrently we should see an increase in speed.
Would do this in another iteration though. I've run this the way it is with many discord channel files and it only took a few seconds at max already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a great idea and this should probably be done in another PR. Concurrency can introduce a bunch of issues, so this is fine for now.
@PascalPrecht sure, no problem! |
@PascalPrecht no issues detected by e2e. Not sure if it needs to be re-reviewed / re-approved, so not moving it to the |
@PascalPrecht I've gone through the changes and they all look fine to me. The only potential issue, but not on my side, is a potential inconsistency with other API methods returning only a single error as a 2nd return parameter, but I think this is fine. We pass multiple inputs via the API and receive multiple outputs, that's fine to me. we want the importing to be fast and robust. |
As part of the new Discord <-> Status Community Import functionality, we're adding an API that extracts all discord categories and channels from a previously exported discord export file. These APIs can be used in clients to show the user what categories and channels will be imported later on. There are two APIs: 1. `Messenger.ExtractDiscordCategoriesAndChannels(filesToimport []string) (*MessengerResponse, map[string]*discord.ImportError)` This takes a list of exported discord export (JSON) files (typically one per channel), reads them, and extracts the categories and channels into dedicated data structures (`[]DiscordChannel` and `[]DiscordCategory`) It also returns the oldest message timestamp found in all extracted channels. The API is synchronous and returns the extracted data as a `*MessengerResponse`. This allows to make the API available status-go's RPC interface. The error case is a `map[string]*discord.ImportError` where each key is a file path of a JSON file that we tried to extract data from, and the value a `discord.ImportError` which holds an error message and an error code, allowing for distinguishing between "critical" errors and "non-critical" errors. 2. `Messenger.RequestExtractDiscordCategoriesAndChannels(filesToImport []string)` This is the asynchronous counterpart to `ExtractDiscordCategoriesAndChannels`. The reason this API has been added is because discord servers can have a lot of message and channel data, which causes `ExtractDiscordCategoriesAndChannels` to block the thread for too long, making apps potentially feel like they are stuck. This API runs inside a go routine, eventually calls `ExtractDiscordCategoriesAndChannels`, and then emits a newly introduced `DiscordCategoriesAndChannelsExtractedSignal` that clients can react to. Failure of extraction has to be determined by the `discord.ImportErrors` emitted by the signal. **A note about exported discord history files** We expect users to export their discord histories via the [DiscordChatExporter](https://github.com/Tyrrrz/DiscordChatExporter/wiki/GUI%2C-CLI-and-Formats-explained#exportguild) tool. The tool allows to export the data in different formats, such as JSON, HTML and CSV. We expect users to have their data exported as JSON. Closes: status-im/status-desktop#6690
ae7a62c
to
0c83fdb
Compare
Thanks @Samyoul for taking the time and giving your approval. In the absolute worst case we can change this to an I'll merge this once CI is green |
As part of the new Discord <-> Status Community Import functionality,
we're adding an API that extracts all discord categories and channels
from a previously exported discord export file.
These APIs can be used in clients to show the user what categories and
channels will be imported later on.
There are two APIs:
Messenger.ExtractDiscordCategoriesAndChannels(filesToimport []string) (*MessengerResponse, map[string]*discord.ImportError)
This takes a list of exported discord export (JSON) files (typically one per
channel), reads them, and extracts the categories and channels into
dedicated data structures (
[]DiscordChannel
and[]DiscordCategory
)It also returns the oldest message timestamp found in all extracted
channels.
The API is synchronous and returns the extracted data as
a
*MessengerResponse
. This allows to make the API availablestatus-go's RPC interface.
The error case is a
map[string]*discord.ImportError
where each keyis a file path of a JSON file that we tried to extract data from, and
the value a
discord.ImportError
which holds an error message and anerror code, allowing for distinguishing between "critical" errors and
"non-critical" errors.
Messenger.RequestExtractDiscordCategoriesAndChannels(filesToImport []string)
This is the asynchronous counterpart to
ExtractDiscordCategoriesAndChannels
. The reason this API has beenadded is because discord servers can have a lot of message and
channel data, which causes
ExtractDiscordCategoriesAndChannels
toblock the thread for too long, making apps potentially feel like they
are stuck.
This API runs inside a go routine, eventually calls
ExtractDiscordCategoriesAndChannels
, and then emits a newlyintroduced
DiscordCategoriesAndChannelsExtractedSignal
that clientscan react to.
Failure of extraction has to be determined by the
discord.ImportErrors
emitted by the signal.A note about exported discord history files
We expect users to export their discord histories via the
DiscordChatExporter
tool. The tool allows to export the data in different formats, such as
JSON, HTML and CSV.
We expect users to have their data exported as JSON.