Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Add path.getsize or file.getsize methods #1130

Closed
1 task done
tairov opened this issue Oct 21, 2023 · 8 comments
Closed
1 task done

[Feature Request] Add path.getsize or file.getsize methods #1130

tairov opened this issue Oct 21, 2023 · 8 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed mojo Issues that are related to mojo mojo-repo Tag all issues with this label mojo-stdlib Tag for issues related to standard library

Comments

@tairov
Copy link

tairov commented Oct 21, 2023

Review Mojo's priorities

What is your request?

Mojo v0.4.0 now supports file module, but it seems that we still need one essential function -- getsize or filesize
With native Mojo file module we have to load whole file into memory just to know its size.

    var f_ = open(file_name, "r")
    let data = f_.read()
    let cp_size = data._buffer.size

What is your motivation for this change?

So far in llama2.🔥 we have to use workaround via Python.

    let _os = Python.import_module("os")
    let ff_size = _os.path.getsize(file_name)
    let cp_size = string.atol(ff_size.to_string())

It's turned out newcomers to Mojo who are trying to run llama2 locally, got multiple issues because of misconfiguration with MOJO_PYTHON_LIBRARY, etc.

It would be great to add getsize or filesize to file or path modules in the next release.

Any other details?

In Python it's implemented in path module

    let _os = Python.import_module("os")
    let ff_size = _os.path.getsize(file_name)

I think it's worthwhile to implement the more general os.stat , since fs_stat must provide more details about file, but I'm not sure what are the concerns regarding platforms compatiblity.

# python alternatives
os.path.getsize()
os.stat()
@tairov tairov added enhancement New feature or request mojo Issues that are related to mojo labels Oct 21, 2023
@gryznar
Copy link
Contributor

gryznar commented Oct 21, 2023

I like this idea, but how about divide words via "_"? In my opinion get_size looks much better than getsize

@tairov
Copy link
Author

tairov commented Oct 21, 2023

Proper naming is really tricky for the language that supposed to be a "drop-in replacement for Python" .

Let's see how it feels in other languages :

Cpp

int main(int argc, char *argv[]) {
  std::filesystem::path p{argv[1]};
  std::cout << "The size of " << p.u8string() << " is " <<
      std::filesystem::file_size(p) << " bytes.\n";
}

Zig

var file = try std.fs.cwd().openFile("file.txt", .{ open = true }));
const file_size = (try file.stat()).size;

Julia

  filesize(path...)
  Equivalent to stat(file).size.

Python

os.path.getsize(file_name)

Swift

let fileUrl: URL
print("file size = \(fileUrl.fileSize), \(fileUrl.fileSizeString)")

I think the Python version is most expected variation (for Python switchers at least)

@abduld
Copy link
Contributor

abduld commented Oct 21, 2023

Thanks @tairov for requesting this (in the the live stream of all places :) ) . We'll make sure this is prioritized.

Keep up the great work

@gryznar
Copy link
Contributor

gryznar commented Oct 21, 2023

If you use tools which will give you code completion, you will obtain both get_size and getsize, so I don't think that is the case here. Readability counts and it is greater with seperated words than mixed up (see simdbitwidth, simdwithof and so on

@gryznar
Copy link
Contributor

gryznar commented Oct 21, 2023

Also, most functions in Mojo have divided words, so every case which breaks from that looks strange

@tairov
Copy link
Author

tairov commented Oct 21, 2023

Also, most functions in Mojo have divided words, so every case which breaks from that looks strange

I hope till stable release the Mojo team will stabilize API & naming conventions.. In other case it could end-up "PHP" scenario. The names of the functions laid down in the first releases have already turned out to be extremely difficult to isolate from the language, and as a result, the entire standard library of functions has become quite heterogeneous. This problem was solved only with the introduction of high-level frameworks (like Laravel & Symfony) that abstracted methods from the standard PHP library and made framework API more clean.

I'm honestly really not sure about proper approach here.. I would suggest to make the API more closer to Python, since it might help with Python -> Mojo switch & learning curve.

As you mentioned, if we use code-completion tools it doesn't make much difference I believe. Especially taking into account the long-term goals of creating a coherent language and further adaptation to it by the community.. Probably by that time you won’t have to read the code, everything will be explained to you by the LLM/AI agent 😀

@tairov
Copy link
Author

tairov commented Oct 21, 2023

Thanks @tairov for requesting this (in the the live stream of all places :) ) . We'll make sure this is prioritized.

Keep up the great work

Thanks @abduld for your feedback.. BTW, I decided to vectorize "feature requests", so a small cart there, a couple more requests #1134 & #1135 😅

PS. It was a great honor to share the livestream stage with your team!

@gryznar
Copy link
Contributor

gryznar commented Oct 21, 2023

Yeah, stable and standardized naming convention would be a great benefit over Python which Mojo tends to be a better version (Python++). This also mean, that Mojo could fix these Python problems related to naming. Mojo should take the best from Python and fix what may be improved.

I will vote for being consistent with own rules and not to break them when trying to replicate Python. As far as I know this strict replication is not possible, so sticking with exactly same names as Python has, is not the best idea IMHO.

Please to be consistent first of all, even if this will require to take other name than Python has.

Edit: one of the big problem related to keeping the same names as Python has is the need of synchronisation with its changes. Let's suppose, that Mojo will take all the names which Python has in some library. If Python will decide to deprecate some function and propose another one instead, Mojo, following changes, should also deprecate this method in its stdlib. Keeping everything in sync would led to maintenance hell.
Edit2: the second reason is the need to have the same signature which may be not optimal (Mojo would like for example to have parametrization). The same problem with keeping that in sync applies

@ematejska ematejska added the mojo-stdlib Tag for issues related to standard library label Oct 23, 2023
@ematejska ematejska added the mojo-repo Tag all issues with this label label Apr 29, 2024
@ematejska ematejska removed the mojo-stdlib Tag for issues related to standard library label May 3, 2024
@ematejska ematejska added the mojo-stdlib Tag for issues related to standard library label May 3, 2024 — with Linear
@ematejska ematejska removed the mojo-stdlib Tag for issues related to standard library label May 6, 2024
@ematejska ematejska added the mojo-stdlib Tag for issues related to standard library label May 6, 2024 — with Linear
@JoeLoser JoeLoser added good first issue Good for newcomers help wanted Extra attention is needed labels May 8, 2024
modularbot added a commit that referenced this issue May 21, 2024
[External] [stdlib] Implement `os.path.getsize`

Add `getsize` method for getting the size (in bytes)
of a path.

Fixes #1130

ORIGINAL_AUTHOR=artemiogr97
<57588855+artemiogr97@users.noreply.github.com>
PUBLIC_PR_LINK=#2626

Co-authored-by: artemiogr97 <57588855+artemiogr97@users.noreply.github.com>
Closes #2626
MODULAR_ORIG_COMMIT_REV_ID: 443260f3be31a694c9e93fc11b64d0317646c4e0
martinvuyk pushed a commit to martinvuyk/mojo that referenced this issue May 24, 2024
[External] [stdlib] Implement `os.path.getsize`

Add `getsize` method for getting the size (in bytes)
of a path.

Fixes modularml#1130

ORIGINAL_AUTHOR=artemiogr97
<57588855+artemiogr97@users.noreply.github.com>
PUBLIC_PR_LINK=modularml#2626

Co-authored-by: artemiogr97 <57588855+artemiogr97@users.noreply.github.com>
Closes modularml#2626
MODULAR_ORIG_COMMIT_REV_ID: 443260f3be31a694c9e93fc11b64d0317646c4e0
modularbot added a commit that referenced this issue Jun 7, 2024
[External] [stdlib] Implement `os.path.getsize`

Add `getsize` method for getting the size (in bytes)
of a path.

Fixes #1130

ORIGINAL_AUTHOR=artemiogr97
<57588855+artemiogr97@users.noreply.github.com>
PUBLIC_PR_LINK=#2626

Co-authored-by: artemiogr97 <57588855+artemiogr97@users.noreply.github.com>
Closes #2626
MODULAR_ORIG_COMMIT_REV_ID: 443260f3be31a694c9e93fc11b64d0317646c4e0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed mojo Issues that are related to mojo mojo-repo Tag all issues with this label mojo-stdlib Tag for issues related to standard library
Projects
None yet
Development

No branches or pull requests

5 participants