# Working with Text files in Python.

> **Magic command** that works only in **Jupyter Notebook** to quickly create and write in txt file.
> The file will be created at the same location of this jupyter notebook.
```python
%%writefile test.txt
Hello, this is the quick test file.
The content below command will be file content.
```

In [35]:
# If provided wrong path or filename, it will raise an Error - FileNotFoundError
myfile = open("whoops.txt")

FileNotFoundError: [Errno 2] No such file or directory: 'whoops.txt'

### KNOW YOUR CURRENT DIRECTORY LOCATION

In [37]:
pwd

'C:\\Users\\meyog\\NLP-Natural_Language_Processing'

In [39]:
myfile = open("test.txt")

In [41]:
myfile.read()

'Hello, this is the quick test file.\nThe content below command will be file content.\n'

## We can't read same file object multiple time.

In [43]:
myfile.read()

''

## We are getting empty string because cursor is at the EOF after reading it.
### To reset the cursor at 0 position or start of the file, use below command.

In [45]:
myfile.seek(0)

0

In [47]:
myfile.readline()

'Hello, this is the quick test file.\n'

# Now we are able to read first line of the file.

In [49]:
content = myfile.read()
content

'The content below command will be file content.\n'

 ### We can see that cursor was at the second line and Hence the output printed is excluding first line.
 #### Make sure you close the file object once you're done with your work.

In [51]:
myfile.close()

### Readlines methods read the file and create list of strings converting each line as list item.

In [53]:
myfile = open("test.txt")
content = myfile.readlines()
print(content)
print("-----------------------------------------------------------------------")
for line in content:
    print("line- : "+line)

myfile.close()

['Hello, this is the quick test file.\n', 'The content below command will be file content.\n']
-----------------------------------------------------------------------
line- : Hello, this is the quick test file.

line- : The content below command will be file content.



# with - context manager method to automatic closing file after use.

In [55]:
with open("test.txt") as myfile :
    print(myfile.read())
    

Hello, this is the quick test file.
The content below command will be file content.



# Working with PDF Files

- Often you will have to deal with PDF files. 
- There are [many libraries in Python for working with PDFs](https://www.binpress.com/tutorial/manipulating-pdfs-with-python/167)
- each with their pros and cons, the most common one being **pypdf**.
- [pypdf installation Document](https://pypdf.readthedocs.io/en/stable/user/installation.html) 

  <code>pip install pypdf</code>
    
- Keep in mind that not every PDF file can be read with this library.
- PDFs that are too blurry, have a special encoding, encrypted, or maybe just created with a particular program that doesn't work well with **pypdf** won't be able to be read.
- If you find yourself in this situation, try using the libraries linked above, but keep in mind, these may also not work. The reason for this is because of the many different parameters for a PDF and how non-standard the settings can be, text could be shown as an image instead of a utf-8 encoding. There are many parameters to consider in this aspect.

- As far as pypdf is concerned, it can only read the text from a PDF document, it won't be able to grab images or other media files from a PDF.

---

## Working with pypdf

In [3]:
pip install --upgrade pypdf

Collecting pypdf
  Using cached pypdf-5.7.0-py3-none-any.whl.metadata (7.2 kB)
Using cached pypdf-5.7.0-py3-none-any.whl (305 kB)
Installing collected packages: pypdf
  Attempting uninstall: pypdf
    Found existing installation: pypdf 4.2.0
    Uninstalling pypdf-4.2.0:
      Successfully uninstalled pypdf-4.2.0
Successfully installed pypdf-5.7.0
Note: you may need to restart the kernel to use updated packages.


In [9]:
pip install --upgrade pyopenssl

Collecting pyopenssl
  Downloading pyopenssl-25.1.0-py3-none-any.whl.metadata (17 kB)
Downloading pyopenssl-25.1.0-py3-none-any.whl (56 kB)
Installing collected packages: pyopenssl
  Attempting uninstall: pyopenssl
    Found existing installation: pyOpenSSL 24.2.1
    Uninstalling pyOpenSSL-24.2.1:
      Successfully uninstalled pyOpenSSL-24.2.1
Successfully installed pyopenssl-25.1.0
Note: you may need to restart the kernel to use updated packages.


In [11]:
pip install pypdf cryptography

Note: you may need to restart the kernel to use updated packages.


In [3]:
from pypdf import PdfReader

reader = PdfReader(r"D:\Yogesh\Expleo\QA - Testing\QA_Testing-Interview Question.pdf")
number_of_pages = len(reader.pages)
page = reader.pages[0]
text = page.extract_text()

In [5]:
print(number_of_pages)

40


In [15]:
print(text)

Most asked Interview Question  
1. What is Quality Assurance (QA)? 
• QA is a systematic process to ensure that software products meet specified 
quality standards and requirements. For example, testing a mobile app to ensure 
all functionalities work correctly before release. 
2. What is the software testing life cycle? 
• The software testing life cycle includes phases such as requirements analysis, 
test planning, test case development, test environment setup, test execution, 
defect reporting, and test closure. 
3. Can you explain the difference between manual testing and automated testing? 
• Manual testing involves human testers executing test cases without automation 
tools, while automated testing uses software tools to run tests automatically, 
increasing efficiency and coverage. 
4. What are some common automation testing tools you have used? 
• I have experience with Selenium, TestNG, JUnit, Cucumber, and Appium, among 
others. 
5. When would you choose to automate a test? 


In [23]:
# Let's grab all the text from 
resume = PdfReader("Yogesh_Mahajan_Production_TeamLead.pdf")
list_page = []
for num in range(len(resume.pages)) :
    page = resume.pages[num]
    list_page.append(page.extract_text())

print(list_page)
    

['Career Objective:To give my best in my professional pursuit for overallbenefit and growth of the company that I serve by facing the challenges. Iwill show my caliber and gain some experience.\uf06eWork History:A] Designation:Production Team LeaderCompany Name:SS Eduks Management Consultants Ltd.Duration:27th Sept 2021 to 7th March 2022Profile Summary:\uf0fcPlanning assigning and directing production work.\uf0fcConducted Daily Team Meetings. Prepared production report.\uf0fcCalculation of Daily Production Losses, OLE & OEE Calculations.B] Designation:Graduate Engineer TraineeCompany Name:Bajaj Electricals Ltd. Chakan, Pune (Maharashtra)Duration:14th Aug 2019 to 13th Aug 2021Profile Summary:\uf0fcSupervised all the production phases of stator assembly for fan production\uf0fcSet and revised production schedules to meet changing demands.\uf0fcEvaluated manpower skills and knowledge regularly.C] Designation:BPO-I (Content Writer)Company Name:TTEC India, Ahmedabad (Gujarat)Duration:15th J

In [33]:
#print last page 
print(list_page[-1])

Certificationsi. Master certificate course in Product Design, IGTR Aurangabad (2021)ii. Course on Basic to Advanced Excel training (BizWiz-2022)iii. English Typing Course - 60 WPM (Ratatyping.com)Strengths:i. I have very good learning ability and convert them into action.ii. Confident and Determined.iii. Ability to cope up with different situations.Hobbies:i. Exercise and Workoutii. Watching Moviesiii. Listening to MusicPersonal Information:Name:Mahajan Yogesh VishvasDate of Birth:20thSept 1996Gender:MaleMarital Status:UnmarriedLanguage Known:English, Hindi, MarathiPermanent Address:At Post- Khedgaon Tal- Chalisgaon Dist-Jalgaon(MH)Pin - 424107---------------------------------------------------------------------------------------------------------------------------------------------------------Declaration: I do hereby declare that all the details furnished above are true to thebest of my knowledge and I bear the responsibility for the correctness of abovementioned particular.Place: