Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR#72453 : Exceeding maximum file name length #72654

Merged
merged 8 commits into from
Jan 14, 2024

Conversation

shahidiqbal13
Copy link
Contributor

This issue is raised by @DrTodd13

The code in include/llvm/Analysis/DOTGraphTraitsPass.h will exceed most normal file system's maximum filename length of 255 if, e.g., the function's name is that length.

@llvmbot llvmbot added clang Clang issues not falling into any other category llvm:analysis labels Nov 17, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 17, 2023

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-llvm-analysis

Author: Shahid Iqbal (shahidiqbal13)

Changes

This issue is raised by @DrTodd13

The code in include/llvm/Analysis/DOTGraphTraitsPass.h will exceed most normal file system's maximum filename length of 255 if, e.g., the function's name is that length.


Full diff: https://github.com/llvm/llvm-project/pull/72654.diff

2 Files Affected:

  • (modified) clang/NOTES.txt (+2)
  • (modified) llvm/include/llvm/Analysis/DOTGraphTraitsPass.h (+5-3)
diff --git a/clang/NOTES.txt b/clang/NOTES.txt
index f06ea8c70cd3409..c83dda52a1fc21e 100644
--- a/clang/NOTES.txt
+++ b/clang/NOTES.txt
@@ -4,6 +4,8 @@
 
 //===---------------------------------------------------------------------===//
 
+//TESTING git infra//
+
 To time GCC preprocessing speed without output, use:
    "time gcc -MM file"
 This is similar to -Eonly.
diff --git a/llvm/include/llvm/Analysis/DOTGraphTraitsPass.h b/llvm/include/llvm/Analysis/DOTGraphTraitsPass.h
index 07c08bc1cc3bcb6..f78d8ff52ee3932 100644
--- a/llvm/include/llvm/Analysis/DOTGraphTraitsPass.h
+++ b/llvm/include/llvm/Analysis/DOTGraphTraitsPass.h
@@ -17,6 +17,8 @@
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/GraphWriter.h"
 
+#define MAX_FILENAME_LEN 255
+
 namespace llvm {
 
 /// Default traits class for extracting a graph from an analysis pass.
@@ -94,7 +96,7 @@ void printGraphForFunction(Function &F, GraphT Graph, StringRef Name,
   raw_fd_ostream File(Filename, EC, sys::fs::OF_TextWithCRLF);
   std::string GraphName = DOTGraphTraits<GraphT>::getGraphName(Graph);
 
-  if (!EC)
+  if (!EC && (Filename.length() <= MAX_FILENAME_LEN ))
     WriteGraph(File, Graph, IsSimple,
                GraphName + " for '" + F.getName() + "' function");
   else
@@ -280,7 +282,7 @@ class DOTGraphTraitsModulePrinterWrapperPass : public ModulePass {
     raw_fd_ostream File(Filename, EC, sys::fs::OF_TextWithCRLF);
     std::string Title = DOTGraphTraits<GraphT>::getGraphName(Graph);
 
-    if (!EC)
+    if (!EC && (Filename.length() <= MAX_FILENAME_LEN ))
       WriteGraph(File, Graph, IsSimple, Title);
     else
       errs() << "  error opening file for writing!";
@@ -310,7 +312,7 @@ void WriteDOTGraphToFile(Function &F, GraphT &&Graph,
   std::string GraphName = DOTGraphTraits<GraphT>::getGraphName(Graph);
   std::string Title = GraphName + " for '" + F.getName().str() + "' function";
 
-  if (!EC)
+  if (!EC && (Filename.length() <= MAX_FILENAME_LEN ))
     WriteGraph(File, Graph, IsSimple, Title);
   else
     errs() << "  error opening file for writing!";

Copy link

github-actions bot commented Nov 17, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link

@DrTodd13 DrTodd13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these changes as they are solve the problem.

@@ -94,7 +96,7 @@ void printGraphForFunction(Function &F, GraphT Graph, StringRef Name,
raw_fd_ostream File(Filename, EC, sys::fs::OF_TextWithCRLF);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Filename is too long, then EC here will already contain an error code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to work around this bug when I found it so in my local version I did a (Name.str() + "." + F.getName().str()).substr(0,250) + ".dot". In this way, I just truncate the lengthy filename to 250 characters before adding the ".dot". However, this approach has the problem that two functions who names differ only past the 250 character mark will try to write to the same file. I don't know what the right solution is here and that's why I posted it for discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DrTodd13 ,
Will change the fix later , need to think

@@ -94,7 +96,7 @@ void printGraphForFunction(Function &F, GraphT Graph, StringRef Name,
raw_fd_ostream File(Filename, EC, sys::fs::OF_TextWithCRLF);
std::string GraphName = DOTGraphTraits<GraphT>::getGraphName(Graph);

if (!EC)
if (!EC && (Filename.length() <= MAX_FILENAME_LEN))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this does anything since if filename is too long then an error code EC will already be set.

@@ -280,7 +282,7 @@ class DOTGraphTraitsModulePrinterWrapperPass : public ModulePass {
raw_fd_ostream File(Filename, EC, sys::fs::OF_TextWithCRLF);
std::string Title = DOTGraphTraits<GraphT>::getGraphName(Graph);

if (!EC)
if (!EC && (Filename.length() <= MAX_FILENAME_LEN))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above.

@@ -310,7 +312,7 @@ void WriteDOTGraphToFile(Function &F, GraphT &&Graph,
std::string GraphName = DOTGraphTraits<GraphT>::getGraphName(Graph);
std::string Title = GraphName + " for '" + F.getName().str() + "' function";

if (!EC)
if (!EC && (Filename.length() <= MAX_FILENAME_LEN))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above.

Copy link

@DrTodd13 DrTodd13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of that new function but I don't think it does what you intended.

@@ -17,6 +17,8 @@
#include "llvm/Support/FileSystem.h"
#include "llvm/Support/GraphWriter.h"

static std::vector<std::string> nameObj;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there is some LLVM code approach that would dictate what to do here. We can have some kind of a data structure to remember names to try to avoid duplicates or we can accept that there may be duplicates but just make them probabilistically rare. I guess I am leaning toward the latter where you do something like take up to a certain number of characters of the filename as they are but then you take the rest of the characters of the filename and hash them and add the hash to the filename. This latter approach works even if you are going in and out of llvm whereas having a data structure to remember names would only work in the context of single execution of LLVM. Again, that is just my opinion and I'm not sure what people with approval authority here will say.

else {
for (auto it = nameObj.begin(); it != nameObj.end(); it++) {
if (*it == FN) {
FN = FN.substr(0, --len);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say that I have 3 filenames submitted in a row that share the same first 250 characters. Line 92 will add the first one truncated to 250. Then, line 97 will add the second one truncated to 249. But then, for the third one, the first time through the nameObj list, it is going to match with the first name you added, decrement len by 1 and then add to nameObj. So, I think the 2nd and 3rd names in your list are going to be duplicates.

Maybe a more classic way of doing this would be to use a set instead of a list, check if the FN truncated to 250 is in the set and if not add it. If it is then truncate to 249 and repeat the process of checking if it is in the set. Add if it isn't and if it is then truncate to 248 and keep repeating the process until it can be added. Of course, this approach has the problem that if you have more than 250 names that shared the first 250 characters then len would go to 0 and you'd try to have a filename that was empty. Then the 251st time you tried it you'd have a negative substr len. If that is equivalent to 0 len then you could end up in an infinite loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DrTodd13 ,
Yes I do agree with the 3 filenames example but the chances would be highly rare that's what y I didn't go for keep searching mechanism

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shahidiqbal13 Thanks for the changes. Just one minor point, you don't need the initial "if" to check for empty set anymore. The first time through the loop if it isn't found in the set then it will add it. Also, I don't know how much we need to consider this but sometimes ordering will be reversed and so the filename assigned to a function would be different in your approach based on which function is encountered first. With the hash approach this wouldn't happen. At this point, we need input from regular LLVM devs in terms of how they like to do things.

@shahidiqbal13
Copy link
Contributor Author

Hi @DrTodd13 ,
Any further needs to be done here ?? Can you please add more devs for reviewing this

Thanks,
Shahid

@shahidiqbal13 shahidiqbal13 self-assigned this Jan 14, 2024
@shahidiqbal13 shahidiqbal13 merged commit 777a67b into llvm:main Jan 14, 2024
3 checks passed
@shahidiqbal13 shahidiqbal13 deleted the shahid_b1 branch January 14, 2024 12:54
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category llvm:analysis
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants